.'S^^f. 


^ 


jiSSSI 


li-M 


MS^f  / 


r 


LIBRARY  RULES 

This  book  mgy  be  kept QZO.Ci weeks. 

A  fine  of  fWo  cents  will  be  charged  for  each  day 
books  or  magazines  are  kept  overtime. 

Two  books  may  be  borrowed  from  the  Library  at 
one  time. 

Any  book  injured  or  lost  shall  be  paid  for  by  the 
person  to  whom  it  is  charged. 

No  member  shall  transfer-his  right  to  use  the  Library 
to  any  other  person. 


THEORY 

OF 

PROBABILITY 


LONDON 

Cambridge  University  Press 

FETTEB  LANE 

NEW  YORK  •   TORONTO 
BOMBAY  •  CALCUTTA  •  MADRAS 

Macmillan 

TOKYO 

Maruzen  Company  Ltd 


All  rights  reserved 


THEOKY 

OP 

PROBABILITY 


BY  THE  LATE 

WILLIAM  BURNSIDE 

Sc.D.,  LL.1D.,  F.R.S. 

HONORARY  FELLOW  OF 
PEMBROKE  COLLEGE,  CAMBRIDGE 


CAMBRIDGE 

AT  THE  UNIVERSITY  PRESS 
1936 


3,73 


First  edition     1928 
Reprinted     1936 


PRINTED  IN  GREAT  BRITAIN 


PREFACE 

THE  present  small   volume  on  the  theory  of  probability 
represents  a  manuscript   which    Professor   Burnside    had 
practically  completed  some  time  before  his  death. 

The  theory  had  begun  to  occupy  his  thought  during  the  war ; 
and  the  earliest  (1918)  of  his  papers,  relating  to  any  of  its 
topics,  deals  with  what  is  manifestly  a  military  question, 
reduced  (for  purposes  of  calculation)  to  a  purely  mathematical 
form.  As  was  his  wont  in  any  subject,  his  interest  in  its 
developments  grew:  a  number  of  isolated  papers  by  him 
appeared  from  time  to  time,  in  a  widening  range  of  treatment. 
Ultimately,  he  set  himself  to  make  a  systematic  account  of  the 
theory  as  it  presented  itself  to  him. 

So  far  as  can  be  remembered  by  Mrs  Burnside,  the  draft  was 
written  at  intervals  before  the  middle  of  1925.  At  the  time 
when  he  had  finished  his  account,  it  contained  all  the  issues 
which  he  proposed  to  discuss:  but  marginal  references  in  the 
manuscript  shew  that  he  intended  to  add  a  number  of  Notes 
elucidating  or  establishing  statements  in  the  text.  Of  these 
Notes,  only  one*  was  actually  written;  and  no  memoranda  have 
been  found  which  might  have  indicated  the  intended  range  of 
the  remainder.  His  work  was  interrupted  by  a  serious  illness 
late  in  1925.  After  a  recovery  which  was  only  partial,  he 
occasionally  longed  to  return  to  the  draft,  so  as  to  make 
additions  and  amplifications:  but  the  necessary  strength  was 
lacking.    The  manuscript  remained  unaltered. 

It  has  seemed  desirable  to  publish  his  draft  exactly  as  he 
left  it.  The  Syndics  of  the  Cambridge  University  Press  have 
been  willing  to  undertake  the  publication;  and  Mrs  Burnside 
desires  me  to  express  her  thanks  to  the  Syndics  for  their 
action. 

*  It  occupies  pp.  101,  102  of  the  volume. 
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At  the  request  of  Mrs  Burnside,  and  by  the  willing  acquies- 
cence of  the  Syndics,  my  notice  of  Professor  Burnside  which  was 
written  for  the  Royal  Society  is  prefixed  to  the  volume.  And 
I  have  appended  (p.  104)  a  list  of  the  papers  which,  in  his 
later  years,  he  published  on  questions  cognate  with  the  range 
of  the  volume. 

During  the  progress  of  the  printing,  I  have  owed  much  to  the 
Secretary  of  the  Syndics,  Mr  S.  C.  Roberts,  and  to  the  Staff  of 
the  University  Press,  for  their  unstinted  and  ready  help ;  and  I 
return  to  them  my  sincere  thanks  for  their  considerate  co- 
operation. 

A.  R.  FORSYTH 

March  1928 


CONTENTS 


Memoir  of  William  Burnside 


PAGE 

xi 


THEORY  OF  PROBABILITY 

CHAPTER   I 

INTRODUCTION 

1.  Preliminary  explanations         ......         1 

2.  Rule  for  calculable  probabilities,  with  three  fundamental 

inferences   .........         4 

3.  4.     Probability  under  repeated  trials  ....  6 

5.  Relations  between  different  probabilities         .         .         .10 

6.  Discussion  of  equality  of  likelihood         .         .         .         .13 

CHAPTER  II 
DIRECT  CALCULATION  OF  PROBABILITIES 

7.  Various  examples: 

I_IV:  cards 15 

V — VII:  selections  from  groups  .         .         .         ,18 

VIII :  divisible  integers      .         .         .         .         .         .19 

IX :  proper  assortments      .  ,         .         .         .  .21 

X,  XI :  division  of  lines      ......       22 

XII :  spinning  of  a  coin      ......       25 


CHAPTER  III 
INDIRECT  METHODS  OF  CALCULATING  PROBABILITIES 

8.  Problems  of   "duration   of  play"  :   calculation   made  to 

depend  on  a  diflference-equation  .....       26 

9.  Examples: 

I :  a  game  for  three     .......       29 

II,  III  :  spinning  of  a  coin  .  .         .         .         .31 

IV  :  progressive  game  with  counters   ....       35 

V  :  game  of  substituting  for  drawn  objects  ...       38 


VIU  CONTENTS 

CHAPTER   IV 
METHODS  OF  APPROXIMATION 


PAGE 


10.  Necessity  for  approximation  when   number  of  trials  is 

very  large  .........       40 

11.  Two  examples,   A  and   B,    of   approximate   results   for 

numerous  spins  of  a  coin      ......       40 

12.  Example,  C,  of  similar  problem  with  less  simple  alterna- 

tives ..........       43 

13.  Probable  value:  most  probable  value       ....       45 

14.  Three  examples,  D,  E,  F,  connected  with  spun  coins        .       45 

15.  Same  problem  with  more  complicated  associated  conditions       50 

CHAPTER  V 
PROBABILITY  OF  CAUSES 

16.  Relative  probability  of  causes  contributing  to  an  event: 

Bayes'  formula,  with  example  I  .  .  .  .  .55 

17.  Preliminary  assumptions  needed  for  the  use  of  the  Bayes 

formula:  examples  II,  III,  IV    .....       56 

18.  Example  V;  Probability  of  a  sum  with  an  inaccurate 

calculator    .........        60 

19.  Two  other  examples,  VI  and  VII,  of  a  posteriori  prob- 

ability        .         .  .         .  .....       62 

CHAPTER  VI 

PROBABILITIES  CONNECTED  WITH  GEOMETRICAL 

QUESTIONS 

20.  Suppositions  as  to  probability  of  position  of  a  point  on  a 

line     ..........        65 

21.  Likewise  for  an  area  and  for  any  multiple  amplitude        .        67 

22.  Illustrations : 

I,  II,  III,  problems  connected  with  a  point  on  a  line         .       69 

'  IV,  V,  VI,  with  a  number  of  points  on  a  line  .  .        70 

VII,  points  on  a  closed  curve  .  .  .  .  .73 

23.  VIII :  points  on  the  surface  of  a  sphere  .         .  .74 

24.  IX  :  Poincare's  problem  on  the  probable  number  of  inter- 

sections of  two  closed  curves  on  a  spiiere     .         .  .77 


CONTENTS  ix 

CHAPTER   VII 
THEORY  OF  ERRORS 


28.  Deduction    (IV)    of   the    arithmetic    mean  as  a  special 

instance,  and  formation  of  the  Gauss  law  of  error 

29,  30.     Discussion  of  the  Gauss  law  of  error 

31.  Minor  errors  ........ 

32.  A  resultant  error  (Y),  linearly  compounded  from  errors 

obeying  th.e  Gauss  law,  itself  obeys  the  law 


CHAPTER   VIII 

GAUSS'S  LAW  OF  ERRORS 

33.  Measure  of  precision  in  the  Gauss  law  :  the  50  per  cent 

zone    ....... 

34.  Mean  error  :  error  of  mean  square  . 

35.  Spread  of  errors      ..... 

36.  Combination  of  determinations 

37.  General  conclusions  as  to  spread  of  errors 

38.  Remarks  on  combination  of  observations  in  practice 


PAGE 


25.  Procedure  when  various  determinations  of  an'unknown 

quantity  have  been  obtained :  assumption  of  arithmetic 
mean  and  deduction  of  "least  squares"         ...       80 

26.  Alternative  assumptions  I,  II  .  .         .  .  .81 

27.  General  assumption  (III)  that  the  probability  of  an  error 

is  a  function   of  the  error  alone:    characteristic   dif- 
ferential equation         ...... 


84 

85 
86 
88 

90 


92 
93 
94 
96 

98 
99 


Note  on  equal  likelihood         .         .         .         .         .         .101 

/x 
e~^^  dx  for  a?  =  0  to  3  at  intervals  of  *!  .     103 
0 

List  of    Burnside's  published  papers,   cognate  with   the 
theory  of  probability  .......     104 

Index 105 


[From  the  Proceedings  of  the  Royal  Society^  Ser.  A, 
vol.  117  (1928),  pp.  xi-xxv.] 


WILLIAM  BURNSIDE 

William  Burnside  was  born  on  July  2, 1852,  the  son  of  William 
Burnside,  a  merchant,  of  7,  Howley  Place,  Paddington,  London. 
His  father  was  of  Scottish  ancestry :  his  grandfather,  who  had 
gone  to  London,  was  a  partner  in  the  bookselling  firm  of  Seeley 
and  Burnside. 

Left  an  orphan  at  the  age  of  six,  Burnside  was  educated  at 
Christ's  Hospital,  where  he  was  a  Grecian :  there,  besides  his 
distinction  in  the  grammar  school,  he  attained  the  highest  place 
in  the  mathematical  school.  Having  been  elected  to  an  entrance 
scholarship  at  St  John's  College,  Cambridge,  he  went  into 
residence  in  October,  1871,  and  was  regarded  as  the  best  man 
of  his  year  in  the  college.  In  accordance  with  the  general 
custom  of  capable  students  of  mathematics  in  Cambridge,  he 
"  coached  "  for  the  Tripos,  his  private  tutor  being  W.  H.  Besant, 
one  of  the  few  rivals  of  the  famous  Routh.  For  some  reason, 
Burnside  migrated  to  Pembroke  College  in  the  same  university, 
the  change  being  made  late  in  his  second  year  (May,  1873).  He 
graduated  in  the  Mathematical  Tripos  of  1875  as  second  wrangler, 
being  bracketed  with  George  Chrystal,  who  afterwards  was  pro- 
fessor at  Edinburgh ;  the  fourth  wrangler  was  R.  F.  Scott,  now  * 
Master  of  St  John's  College.  In  the  subsequent  Smith's  Prize 
Examination,  Burnside  was  first  and  Chrystal  was  second. 

A  fellowship  at  Pembroke  was  the  worthy  sequel  of  such  a 
degree :  he  continued  a  fellow  from  1875  until  1886.  He  was 
at  once  appointed  to  lecture  in  his  college :  and  he  lectured 
also  at  Emmanuel  in  1876  and  at  King's  in  1877.  At  that  time, 
college  teaching  for  the  best  students  was  sometimes  shared  by 
a  few  colleges,  in  isolated  groups,  and  included  subjects  selected 
from  the  average  normal  course  for  Honours ;  and  Burnside,  in 
addition,  gave  lectures  on  hydrodynamics,  an  advanced  course 
open  to  all  the  University.    That  particular  subject  was  coming 

*  The  writer  is  indebted  to  Sir  Robert  Scott,  for  several  of  the  personal  records 
in  this  notice. 
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into  vogue  again  at  Cambridge  ;  attention,  regularly  paid  to  the 
established  work  of  Stokes,  was  stimulated  by  the  then  new 
work  of  Greenhill  and  especially  of  Lamb.  Burnside  also  examined 
for  the  Mathematical  Tripos  from  time  to  time.  Occasionally, 
he  did  some  private  coaching.  But  later  it  appeared  that,  in- 
stead of  restricting  himself  mainly  to  Tripos  subjects  in  further- 
ance of  his  lectures  and  his  inevitable  share  in  examinations,  he 
had  launched  himself  upon  a  broad  sea  of  study,  then  far 
removed  from  the  Tripos  domain. 

As  an  undergraduate,  he  had  proved  an  expert  oarsman. 
While  at  St  John's  College,  even  as  a  freshman,  he  had  rowed 
in  the  Lady  Margaret  First  Boat  which,  with  the  famous  Goldie 
as  stroke,  went  head  of  the  river  in  1872.  Kather  light  in  weight 
as  an  undergraduate,  too  light  (according  to  the  canons  of  the 
day)  to  be  considered  for  the  University  Boat,  he  was  always 
rather  spare  of  build  and  he  retained  a  wonderful  power  of 
endurance ;  and  he  kept  his  rowing  form  for  many  years.  He 
rowed  in  the  Pembroke  Boat  after  graduation,  as  long  as  he 
continued  in  residence;  he  was  a  splendid  "7,"  and  had  a  full 
share  in  its  steady  rise  on  the  river.  For  some  years  after  he  left 
Cambridge,  his  reputation  as  an  oar  survived  as  a  tradition  in 
college  circles. 

After  going  out  of  residence,  similar  opportunities  for  rowing 
were  not  accessible.  But  in  the  course  of  holidays  frequently 
spent  in  Scotland,  Burnside  had  acquired  a  zest  for  fishing ; 
and  for  many  a  summer  onwards  he  continued  to  go  there, 
pursuing  what  grew  to  be  his  favourite  sport.  As  in  rowing,  so 
in  fishing,  he  developed  skill  and  became  an  expert  fisherman ; 
indeed,  with  all  he  undertook,  nothing  short  of  his  best  was 
sufficient. 

In  1885,  at  the  instance  of  Mr  (afterwards  Sir)  William  Niven, 
the  Director  of  Naval  Instruction — himself  a  Cambridge  man, 
devoted  to  natural  philosophy,  as  it  was  styled  by  good  Newton- 
ians— Burnside  was  appointed  professor  of  mathematics  in  the 
Royal  Naval  College  at  Greenwich.  The  rest  of  his  teaching  life 
was  spent  in  that  post.  There  was  a  current  belief,  a  belief 
now  known  to  be  justified  by  fact,  that  his  old  college  had 
invited  him  to  return  to  important  office ;  but  he  remained  at 
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Greenwich.  His  work  was  to  his  liking.  It  was  a  course,  well- 
defined  in  extent  and  in  demands  on  time,  within  a  variety  of 
congenial  subjects,  though  only  touching  in  part  upon  the 
regions  of  his  constructive  thought.  The  actual  teaching,  with 
its  incident  duties,  left  him  adequate  opportunity  to  keep  abreast 
of  progress,  even  to  advance  progress,  in  the  subjects  of  pro- 
fessional duty.  It  also  left  him  leisure,  which  Avas  carefully 
and  diligently  used,  to  pursue  his  own  researches,  whatever 
their  direction.  Best  of  all  to  him,  he  was  free  from  the  inter- 
ruptions and  the  incessant  small  demands,  business  and  social, 
that  are  inseparable  from  official  administration.  For  at  all  times, 
and  in  all  ways,  multifarious  detail — whether  incidental  to  the 
non-scientific  side  of  official  duty,  or  the  current  presidency  of  a 
scientific  society  such  as  the  London  Mathematical,  even  the 
purely  algebraical  garniture  and  the  side-issues  in  mathematical 
investigations — such  detail  was  inexpressibly  irksome  to  his 
spirit. 

At  Greenwich,  Burnside's  work  was  devoted  to  the  training 
of  naval  officers.  It  consisted  of  three  ranges.  There  was  a  junior 
section  for  gunnery  and  torpedo  officers;  the  chief  subject  of 
study  was  the  principles  of  ballistics.  There  was  a  senior  section 
for  engineer  officers :  the  chief  subjects  of  study  were  strength  of 
materials,  dynamics,  and  heat  engines.  The  advanced  section — 
perhaps  that  in  which  he  exercised  the  greatest  influence  on  his 
students — was  reserved  for  the  class  of  naval  constructors ;  in 
that  range,  Burnside's  special  mastery  of  kinematics,  kinetics, 
and  hydrodynamics,  proved  invaluable.  Records  and  remembrance 
declare  that  he  was  a  fine  and  stimulating  teacher,  patient  with 
students  in  their  difficulties  and  their  questions — though  else- 
where, as  in  discussion  with  equals,  his  manner  could  have  a 
directness  that,  to  some,  might  appear  abrupt.  He  certainly 
earned  the  gratitude  of  his  students,  as  appeared  from  their 
spontaneous  token  of  tribute  to  him  when  he  left  in  1919 ;  the 
address,  which  they  then  presented,  was  treasured  by  him  and 
his  family. 

Burnside  had  married  Alexandrina  Urquhart  in  1886,  soon 
after  he  was  appointed  professor  at  Greenwich.  She  survives 
him,  with  their  family  of  two  sons  and  three  daughters. 
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After  his  work  at  the  Naval  College  had  ended,  the  whole 
family  retired  to  West  Wickham  in  Kent.  Burnside,  happy  as 
he  had  been  in  that  work  and  regretting  its  actual  termination, 
enjoyed  his  leisure,  spending  it  among  his  books,  in  fishing  holi- 
days in  Scotland  and,  not  least,  in  his  researches,  some  continued 
in  regions  recognised  as  specially  his  own,  some  of  them  in  the 
systematic  development  of  ideas  in  still  another  branch  of  mathe- 
matics upon  which  his  intellectual  interests  had  settled.  The 
last  year  of  his  life  was  marked  by  failing  health:  and  the 
proximate  cause  of  his  death  was  a  recurrence  of  cerebral 
haemorrhage.  He  died  on  August  21,  1927  ;  and  he  is  buried  in 
West  Wickham  churchyard. 

In  recognition  of  his  eminence  as  a  mathematician,  not  a  few 
academic   honours  came  to  Burnside  during  his  life.    He  was 
never  avid  of  honours;  indeed,  he  was  eager  tc  avoid  those 
forms  of  academic  recognition  constituted  by  official  positions  of 
dignity,  when  they  demanded  the  performance  of  any  public 
duty  set  in  formal  pomp  or  circumstance.   He  received  honorary 
degrees,  Sc.D.  from  Dublin,  LL.D.  from   Edinburgh.    He  was 
elected  a  Fellow  of  the  Royal  Society  in  1893,  on   the  first 
occasion  of  candidature :  he  served  on  the  Council  of  that  body 
from  1901  to  1903 ;  and  he  was  awarded  one  of  the  two  Royal 
medals  for  the  year  1904.    He  was  a  member  of  the  Council  of 
the  London  Mathematical  Society  for  the  long  continuous  period 
from  1899  to  1917  :  there,  he  was  a  tower  of  strength,  in  advice 
during  the  Council's  meetings,  and  by  his  many  reports  as  a 
referee  upon  a  multitude  of  varied  original  papers  submitted 
by  a  small  army  of  authors.    He  was  awarded  the  De  Morgan 
medal  of  the  Society  in  1899.   From  1906  to  1908  he  served  as 
President :  while  willingly  allowing  his  name  to  be  submitted 
for  membership  of  the  Council  year  after  year,  he  accepted  their 
highest  office  only  with  grave  and  characteristic  reluctance.  The 
honour,  in  which  he  appeared  to  shew  most  interest,  was  con- 
ferred on  him  in  1900.    In  that  year  he  was  elected  an  Honorary 
Fellow  of  his  old  college,  Pembroke ;  and  at  the  time  of  his 
death  he  had  become  the  senior  on  the  small  roll  of  Honorary 
Fellows.    Yet,  even  in  the  few  and  far  from  fluent  remarks  of 
thanks  which   he  made  at  the  college  dinner  welcoming,  by 
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courteous  custom,  the  newly  elected  honoraiy  members  of  the 
foundation,  he  urged  that  the  happy  and  successful  pursuit  of 
research  was  its  own  reward ;  and  the  sincerity  of  his  plea  was 
appreciated  not  least  by  those  who  had  done  their  part  in 
recognition  of  his  labours. 

Bumside  was  frequently  called  upon  to  examine  for  the 
Mathematical  Tripos  and  for  the  open  Civil  Service  examina- 
tions of  the  highest  grade.  Occasionally,  he  acted  as  external 
examiner  for  one  or  other  of  the  English  Universities,  as  well  as 
for  the  Naval  College  after  his  retirement.  He  was  not  an  easy 
examiner — before  his  early  days  of  such  duty,  the  phrase  "easy 
problems"  at  Cambridge  had  come  to  bear  a  perverse  significance. 
His  questions  could  be  of  the  type  which,  gathered  in  one  of  his 
papers,  might  justify  the  epithet  beautiful:  they  were  certainly 
too  beautiful  for  the  candidates  in  the  1881  Tripos,  the  first 
university  occasion  when  he  examined.  Yet,  though  they  often 
were  difficult  and  always  on  a  high  level,  they  were  set  with 
the  design  of  evoking  an  examinee's  thought,  rather  than  of 
providing  an  opportunity  for  the  facile  display  of  trained  mani- 
pulative skill  along  familiar  lines. 

Through  many  years,  Bumside  was  in  constant  requisition  as 
a  referee,  for  the  Royal  Society  and  for  the  London  Mathematical 
Society.  He  could  not  be  called  lenient:  for,  however  sympathetic 
with  writers,  and  especially  young  writers,  he  held  a  high  stan- 
dard of  the  attainment  that  was  deserving  of  publication.  He 
was  often  fruitful  in  suggestion.  He  could  even  be  severe  on 
occasion:  yet  he  would  mitigate  a  judgment  when  grounds  for 
its  reconsideration  w^ere  submitted.  Similarly,  as  a  critic  of  a 
friend's  proof-sheets,  he  could  be  severe,  yet  always  objectively 
so:  he  obviously  assumed,  without  the  possibility  of  question, 
that  the  friend's  standard  and  his  own  were  alike  in  practice. 
Thus,  at  the  end  of  a  discussion,  the  friend  would  find  that 
added  light  had  been  cast  upon  the  whole  matter — surely  the 
best  criterion  of  sympathetic  criticism.  And  if  severe  with  others, 
he  was  stern  with  himself — a  mental  discipline  that  exercised 
its  influence  towards  the  directness  and  the  precision  both  of 
form  and  of  substance  in  his  writings. 
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Valuable  as  were  his  teaching,  his  activity  as  an  examiner, 
and  his  influence  as  a  referee,  it  is  by  the  contributions  which 
he  has  made  to  his  science  that  Burnside's  name  will  be  held  in 
remembrance. 

His  range  was  wide;  for  it  branched  out,  through  applied 
mathematics  from  the  days  of  his  early  training,  into  great 
tracts  of  pure  mathematics  in  the  years  of  his  matured  powers. 
Yet,  even  in  the  later  time,  when  specialisation  has  tended  to 
become  acute,  he  could  specialise  with  the  best.  Though  of 
course  not  comparable  with  an  Euler,  a  Cauchy,  or  a  Cayley,  in 
the  variety  or  the  amount  of  work  he  has  left,  he  has  delved  in 
many  fields  and  has  left  his  trace  in  many  directions.  He  pub- 
lished over  one  hundred  and  fifty  papers,  as  well  as  one  treatise, 
the  Theory  of  Groups,  of  which  a  second  (and  greatly  amplified) 
edition  was  issued  also  under  his  own  care.  He  has  also  left  a 
manuscript,  fairly  complete  as  far  as  it  was  carried,  on  the  theory 
of  probability.  He  himself  did  not  regard  this  work  as  finished; 
on  various  issues,  he  was  in  correspondence  from  time  to  time 
with  the  present  President  of  the  Royal  Society,  the  Astronomer 
Royal,  and  others;  and  he  certainly  did  not  consider  that  he  had 
resolved  all  his  own  questions.  Had  life  in  health  lasted  appre- 
ciably longer,  there  is  no  doubt  that  he  could  have  attained,  as 
he  intended  to  pursue,  further  development  in  a  subject  which 
occupied  much  of  the  thought  of  his  later  years. 

In  that  considerable  tale  of  papers,  most  are  short.  Very  many 
of  them  occupy  only  a  few  pages.  His  longest  individual  paper — 
he  never  used  the  more  ambitious  title  "memoir" — deals  with 
automorphic  functions:  it  really  consists  of  two  parts  connected, 
though  not  consecutive,  in  matter;  and  the  whole  occupies  no 
more  than  fifty- three  octavo  pages.  Brief  however  as  his  papers 
are,  it  can  fairly  be  asserted  that  each  one  of  them  contains 
some  definite  and  recognisable  result  or  results.  He  never  dis- 
cussed side-issues ;  he  would  not  even  dwell  on  the  minute 
details  of  a  main  issue.  Indeed,  he  could  be  intellectually  bored 
by  processes,  that  halted  in  their  march  to  settle  subsidiary 
questions  as  they  arose;  with  him,  auxiliary  necessary  material 
was  set  out  before  the  main  advance.  When  once  an  issue  was 
attained,  he  was  content  to  let  it  stand  by  its  own  significance : 
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to  others  he  would  leave  attempts  "to  gild  refined  gold,  to  paint 
the  lily." 

He  happily  was  saved  mathematical  controversy,  which  he 
detested.  On  one  occasion  he  was  surprised,  even  disturbed,  by 
the  receipt  of  an  unseemly  letter  the  very  tone  of  which  amazed 
him  (not  unjustifiably):  it  concerned  a  question  of  priority  which, 
in  so  far  as  it  could  afifect  a  man  punctilious  in  his  acknowledg- 
ment of  the  work  of  others,  to  Burnside  was  as  thin  as  air,  though 
manifestly  not  so  to  the  writer  of  the  letter.  The  quiet  firmness 
of  Burnside's  answer  to  his  ungracious  correspondent  ended  the 
matter.  On  occasion,  his  work  has  been  known  to  provide  am- 
munition for  others.  Thus  in  1887  and  1888  he  wrote  papers  on 
the  kinetic  theory  of  gases,  a  subject  which  at  that  date  led 
to  much  disagreement  in  opinion;  stating  his  assumptions,  he 
dealt  with  the  average  exchange  of  energy  during  the  impact  of 
elastic  spheres  and  with  the  partition  of  energy  between  motions 
of  translation  and  of  rotation.  These  papers  can  only  have  been 
the  outcome  of  some  appeal  emanating  from  Tait.  The  result 
was  used  (but  Burnside  took  no  direct  part)  in  an  onslaught 
upon  Boltzmann's  work  made  by  Tait,  a  "bonnie  fechter,"  never 
reluctant  in  the  use  of  the  controversial  tomahawk. 

In  his  writings,  Burnside  had  a  style  which  precisely,  and 
habitually  (as  if  it  were  an  instinct),  contributed  to  efficiency 
of  presentation.  Even  while  an  undergraduate,  he  had  been  noted 
for  the  style  of  his  mathematical  work ;  he  was  reputed  to  be  the 
most  "elegant,"  though  not  the  most  widely  read  (Chrystal  was 
thus  reputed),  among  the  young  mathematicians  of  his  own 
standing.  In  pure  literature,  critics,  whether  analytic  or  con- 
structive, do  not  always  agree  upon  the  necessary  essentials  of 
general  style,  though  they  can  select  individual  characteristics. 
In  scientific  productions,  the  task  is  assuredly  no  easier  than  in 
the  humanities.  Burnside  had  two  of  the  essential  secrets  of  an 
effective  style :  he  exercised  a  power  of  clear  and  precise 
thinking  that  was  maintained  until  the  achievement  of  a  de- 
finite issue;  and  he  possessed  a  faculty  of  lucid  (if  condensed) 
expression  of  the  whole  course  of  a  constructive  argument.  He 
was  intolerant  of  approach  to  vague  meandering:  "  Words,  words" 
would  be  his  caustic  comment  on  an  unconstructive  passage.  The 
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elusive  charm  of  the  sudden  thought,  that  in  itself  is  a  revelation, 
is  rare  in  mathematics,  though  it  can  be  found  in  a  Fourier  or 
a  Salmon.  But  such  was  not  Burnside's  aim,  perhaps  never  his 
dream ;  he  did  not  seek  for  aught  else  than  clearness,  directness, 
terseness  most  of  all.  He  would  practise  no  art  in  trying  to  secure 
the  attention  of  an  inexpert  beginner.  In  exposition,  conciseness 
was  his  rule.  Once,  the  attempt  of  a  friend,  to  obtain  from  him 
a  more  expanded  treatment  of  some  early  stages  in  his  Theory 
of  Groups,  was  met  by  a  declaration  of  regret  that  he  had  been 
unable  to  effect  further  condensation.  The  consequence  is  that 
all  Burnside's  published  work  is  close  and  firm  in  texture;  yet, 
to  an  attentive  reader,  it  is  never  lacking  in  clearness  and  move- 
ment. 

Throughout  Burnside's  residence  at  Cambridge,  the  Uni- 
versity had  been  in  the  finest  flower  of  her  activity  in  applied 
mathematics.  Stokes,  Cayley,  Adams,  were  long-established 
professors;  Maxwell's  appointment  had  been  more  recent.  The 
staple  subjects  for  the  most  capable  mathematical  students  were 
physical  astronomy,  dynamics,  light,  sound,  and  heat.  The  range 
of  electricity  and  magnetism,  except  for  a  slight  infusion  of  some 
of  the  work  of  Sir  William  Thomson  (afterwards  Lord  Kelvin), 
was  academic  and  unconnected  with  laboratory  knowledge ;  and 
Maxwell's  presentation,  based  on  the  researches  of  Faraday,  had 
still  to  make  its  place  in  the  Cambridge  course,  men  scarcely 
even  dreaming  of  the  revolution  it  was  to  accomplish  later.  Pure 
mathematics,  save  for  the  rare  appearance  of  a  Clifford,  a  Pendle- 
bury,  or  a  Glaisher,  was  left  to  Cayley's  domain,  unfrequented 
by  aspirants  for  high  place  in  the  Tripos.  Much  of  the  original 
thought  of  her  mathematicians  in  those  years  found  its  expression 
in  problems,  a  veritable  mine  of  isolated  results  propounded  as 
conundrums  in  the  Senate  House  and  in  College  examinations. 
Even  so,  the  worship  of  the  mathematical  spirit  at  the  shrine  of 
natural  philosophy  was  maintained  in  a  well-defined  conservative 
range. 

At  the  beginning  of  his  work,  Burnside  could  hardly  fail  to 
conform  to  this  Cambridge  use ;  indeed  as  regards  the  subjects 
(though  not  as  regards  all  methods  for  the  subjects)  m  applied 
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mathematics,  he  largely  remained  in  the  older  round  to  the  end. 
Yet  even  while  he  continued  in  Cambridge,  he  was  gradually 
emerging  into  his  own  domain.  Bred  an  applied  mathematician 
in  the  Cambridge  school  of  natural  philosophy,  which  tended  to 
regard  all  mathematics  as  a  useful  tool — no  more  than  a  tool — 
in  so-called  practical  applications,  he  came  to  find  that  there 
was  a  world  of  pure  mathematics  different  from  that  which  filled 
the  receptive  stage  of  his  student  days.  In  the  creative  stage  of 
thinking  for  himself  beyond  the  range  of  learning  and  of  teach- 
ing for  the  Tripos,  he  gradually  made  his  way  into  that  new 
world.  He  took  rank  with  the  constructive  pure  mathematicians, 
without  losing  hold  of  his  earlier  studies.  Indeed  to  him,  as  to 
others  with  a  similar  experience,  the  new  knowledge  shed  fresh 
light  upon  the  older  interests ;  but  any  effective  combination  of 
the  old  and  the  new  could  only  be  made  by  an  intellect  of  the 
type  such  as  Burnside  happily  possessed. 

Thus,  as  already  stated,  Burnside's  earliest  advanced  lectures 
were  devoted  to  hydrodynamics.  Elsewhere,  the  old-fashioned 
methods  for  conjugate  functions,  stream-lines,  and  velocity- 
potential,  were  being  analytically  transformed  through  the  in- 
troduction of  functions  of  a  complex  variable.  For  many  a  day, 
Cambridge  had  preserved  an  almost  invincible  repulsion  to  the 
then  objectionable  V— 1,  cumbrous  devices  being  adopted  to 
avoid  its  use  or  its  occurrence  wherever  possible.  But  some 
teachers  could  shew  that,  in  two-dimensional  fluid  motion,  sim- 
plicity and  new  results  alike  were  easily  attainable  by  its  means ; 
and  its  formal  debut  within  the  Cambridge  enclosure  was  made 
in  Lamb's  treatise.  To  Burnside's  intellect  the  new  calculus 
appealed;  and  as  a  matter  of  record,  his  first  published  paper 
(1883)  is  concerned  with  elliptic  functions,  not  with  hydro- 
dynamics. 

Three  examples  will  suffice  to  indicate  the  development  in 
Burnside's  thought,  thus  indicated. 

In  1888  he  investigates  three  main  questions  connected  with 
deep-water  waves  resulting  from  a  limited  initial  disturbance,  a 
research  probably  suggested  by  certain  phenomena  noted  in  the 
Krakatoa  eruption.  In  that  paper,  he  proceeds  by  analysis  which 
belongs  to  what  would  now  be  called  the  classical  methods  of 
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Fresnel,  Poisson,  and  Stokes ;  it  requires  much  elaborate  work 
in  definite  integrals  with  real  variables,  without  any  reference 
to  the  (happily  satisfied)  convergence  of  those  integrals ;  and 
Burnside  arrives  at  direct  results  of  observable  significance, 
which  relate  to  the  greatest  amplitude  of  displacement,  the 
range  of  propagation,  and  the  governance  of  the  wave-length.  It 
is  not  without  interest,  in  connection  with  his  increasing  grasp 
of  newer  methods,  to  note  that  in  this  paper  he  "justifies"  the 
use  of  a  complex  value  for  a  constant — while,  two  years  later 
in  a  paper  which  deals  with  streaming  motion,  he  uses  complex 
variables  without  a  word  of  prelude  to  superfluous  justification. 
The  problem  of  the  two-dimensional  potential,  as  envisaged 
by  the  applied  mathematicians  in  the  middle  third  of  the  last 
century,  such  as  Green,  Stokes,  Thomson  and  Tait,  has  been 
completely  changed  by  the  ideas  of  the  theory  of  functions.  Old 
assumptions  have  had  their  significance  and  their  limitations 
revealed,  the  earlier  physicists  not  always  in  sympathy  with 
exacting  refinements  which  to  them  smack  of  pedantry,  the  later 
mathematicians  not  always  respectful  to  the  intuitions  content 
with  a  semblance  of  proof.  Burnside  knew  both  attitudes  of 
mind — the  earlier  from  his  training,  the  later  from  his  continued 
study ;  and  so  he  could  bring  old  results  to  new  issues.  Thus  in 
a  paper  (1891)  on  the  theory  of  the  two-dimensional  potential, 
satisfying  the  equation 

d'^u      d^u  _ 

and  determined  by  prescribed  conditions  within  an  area  and 
assigned  values  along  a  boundary,  he  returns  to  the  old  property 
— the  possession  of  every  undergraduate — that  the  potential  can 
have  no  maximum  or  minimum  within  the  boundary.  Pointing- 
out  that  maxima  and  minima  must  therefore  lie  on  the  boundary 
and  that  conditions  of  continuity  require  their  aggregate  to  be 
an  even  integer,  he  obtains  a  relation  between  that  integer,  the 
integer  denoting  the  number  of  distinct  portions  of  the  boundary, 
and  the  integer  representing  the  number  of  double  points  on 
the  equipotential  contour  lines  as  they  pass  from  a  boundary  arc 
over  the  area  back  to  another  boundary  arc.  Moreover,  he  obtains 
the  relation  for  the  most  general  case  when  the  conditions  are 
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extended  so  as  to  admit  discontinuities  (in  the  form  of  logarithmic 
or  algebraic  infinities)  within  the  boundary;  and  he  indicates 
the  bearing  of  the  relation  on  the  graphs  of  these  contour  lines. 
In  1894  he  published  a  paper  discussing  Green's  Function  for 
a  system  of  non-intersecting  spheres.  There,  beginning  with  the 
known  result  for  two  spheres,  he  transformed  it  by  a  property 
he  had  deduced  from  a  geometrical  interpretation  of  homo- 
graphic  substitutions.  He  extended  the  transformed  result  to 
any  number  of  spheres.  By  inversions  which  are  represented  by 
point  transformations,  and  by  sets  of  inversions  which  accumulate 
into  a  group  of  transformations,  he  obtains  a  pseudo-automorphic 
function,  in  the  form  of  a  series  where  the  coefficients  of  the 
successive  terms  are  powers  of  the  magnification  at  the  successive 
inversions.  Lord  Kelvin  would  not  have  recognised  his  theory 
of  images  in  that  final  form  :  yet  the  development  into  that  form 
is  only  a  continued  amplification  of  the  theory.  Burnside,  more- 
over, carried  it  further,  by  connecting  the  application  with  any 
solution  of  Laplace's  equation,  instead  of  the  inverse  distance 
alone  as  in  the  theory  of  images.  Here,  as  in  all  his  investi- 
gations, it  was  only  too  evident  that  he  had  wandered  far  from 
the  ancient  Cambridge  fold. 

Various  well-marked  stages  in  the  progress  of  Burnside's 
knowledge  almost  indicate  themselves,  from  the  evidence  of  his 
original  papers. 

Apparently,  the  first  large  new  subject,  of  which  he  made  a 
profound  study,  was  elliptic  functions  :  its  rudiments  had  hardly 
been  admitted  to  his  Cambridge  course.  At  every  turn  he  devised 
something  novel.  Is  it  the  transformation  of  the  simplest  elliptic 
differential  element  ?  Noting  the  general  characteristic  of  the 
four  critical  points  in  the  Riemann  interpretation,  he  deals  with 
the  successive  possibilities  of  the  transformation  :  (a)  into  itself, 
by  interchanging  these  four  points  in  pairs,  with  the  obvious 
inference  that  there  are  three  modes,  which  arfe  explicitly  ob- 
tained ;  (6)  into  the  Weierstrass  normal  form,  with  one  of  the 
critical  points  sent  to  infinity,  and  the  remaining  three  practically 
arbitrary;  (c)  into  the  Legendre  normal  form,  with  the  four  points 
symmetrically  arranged  round  the  origin  along  an  axis ;  and 
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(d)  into  the  Riemann  normal  form,  with  0, 1,  oo  ,as  three  canonical 
points  for  all,  and  the  fourth  defined  by  the  parametric  invariant 
of  the  element.  Is  it  so  simple  an  issue  as  the  division  of  the 
periods  by  3  or  by  9  ?  Even  for  the  simplest  form  of  that  issue, 
he  treats  it  by  a  general  method  and  not  by  any  special  artifice  : 
a  short  paper  in  1883  achieves  the  trisection  for  the  Jacobian 
elliptic  functions ;  a  later  paper  in  1887  achieves  the  same 
problem  for  the  Weierstrass  elliptic  functions ;  a  still  later  paper 
uses  the  same  method,  supplemented  by  the  introduction  of 
resolvents,  to  obtain  the  results  for  division  by  9.  Is  it  the 
extension  of  Jacobi's  expression  of  the  apparently  hyperelliptic 
integral 

{x  (1  —  a?)  (a?  —  X)  (x  —  k)  (x  —  k\)]  ~  ^  dxy 
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under  the  (quadratic)  transformation 
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as  the  sum  of  two  elliptic  integrals  ?  Burnside  deals  with  the 
cubic  and  the  quintic  transformations  in  odd  degree,  with  the 
quartic  transformation  in  even  degree,  and  obtains  the  respective 
types  of  degenerate  hyperelliptic  integrals;  characteristically 
leaving  other  instances  as  "  exercises "  (though,  not  "  easy " 
exercises)  in  the  method  expounded.  And,  almost  as  an  incident, 
he  notes  a  case  when  an  apparently  elliptic  integral 

/  ^  K^  -  «)  (^  -  /^)  (^  -  7)  (^  -  ^)\''dx^ 


where  the  relation 
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transforms  the  elementary  elliptic  differential  into  itself,  is  only 
simply  periodic.  Or,  to  take  only  a  last  example  in  this  range, 
he  completes  the  know^n  proposition  that  the  co-ordinates  of  a 
point  on  the  intersection  of  two  quadrics  are  expressible  in  terms 
of  elliptic  functions,  by  constructing  the  actual  arguments ;  and 
he  shews  that  the  two  invariants  in  the  Weierstrass  form  are 
the  quadrinvariant  and  the  cubinvariant  of  the  customary  quartic 
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equation  occurring  in  the  reference  of  the  quadrics  to  their  com- 
mon self-conjugate  tetrahedron. 

Another  subject  that  absorbed  his  attention  was  differential 
geometry,  which  also,  save  for  some  rarely  read  sections  in 
Salmon's  Geometry  of  Three  Dimensions,  hardly  entered  into 
the  Cambridge  course.  Burnside  gathers  together  fundamental 
propositions,  then  accessible  only  by  search  among  widely  scat- 
tered authorities ;  and  he  applies  them  with  effect.  Before  1890, 
the  parameters  of  nul  lines  on  a  surface  had  not  appeared  (or 
perhaps,  only  with  Cayley)  in  English  memoirs.  In  one  paper, 
Burnside  uses  them,  with  severe  ingenuity,  to  obtain  the  dif- 
ferent classes  of  surfaces  that  possess  plane  lines  of  curvature. 
In  another  paper,  he  uses  them  to  construct  the  differential  equa- 
tion of  all  confocal  sphero-conics,  proving  that  the  co-ordinates 
of  points  are  expressible  in  terms  of  elliptic  functions  of  a  para- 
metric argument  which  is  obtained  explicitly.  There,  as  always 
in  his  papers,  Burnside's  work  marches  forward  to  a  definite 
issue  and  constitutes  a  contribution  to  knowledge. 

Comparative  simple  known  properties  are  given  a  widened 
significance.  Thus  he  takes  the  known  property  that  two  finite 
screws  compound  into  a  single  screw ;  and  (1890)  he  devises  a 
simple  geometrical  construction  for  the  axis  of  the  resultant 
screw.  He  notes  that,  as  the  proof  does  not  require  the  use  of 
parallels,  the  result  is  valid  for  elliptic  space  and  for  hyperbolic 
space.  Five  years  later,  he  returns  to  the  matter  in  a  paper  on 
the  kinematics  of  non-Euclidean  space ;  and  now  he  notes  that 
displacements  correspond  to  point-transformations,  sets  of  dis- 
placements to  groups  of  transformations.  The  theory  of  groups 
is  beginning  to  affect  his  work. 

He  can  derive  new  results  from  elementary  results  in  ordinary 
geometry,  as  well  as  from  the  range  of  abstract  geometry.  His 
interpretation  of  a  homographic  substitution 
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as  inversion  at  two  fixed  circles — this  1891  paper  seems  the 

first  occasion  when  the  specific  mention  of  a  group  is  made  in 

his  published  work — is  used  to  assign  the  criteria,  necessary  and 

sufficient,  to   determine  whether  a  group,  formed  of  assigned 
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fundamental  transformations,  will  or  will  not  contain  a  loxo- 
dromic  substitution.  Or  he  will  deal  with  the  ancient  problem 
of  drawing  a  straight  line  between  two  points,  for  which  the 
ruler  suffices  in  the  Euclidean  postulate  when  the  points  lie  at 
an  implicitly  supposed  finite  distance  apart ;  and  he  gives  a 
construction  for  the  cases,  when  one  of  the  points  is  at  infinity, 
when  both  of  them  are  at  infinity,  when  one  of  them  is  the  ideal 
point  required  in  projective  geometry ;  his  construction  applies 
to  any  space,  Euclidean,  elliptic,  hyperbolic.  Or  he  will  take  a 
proposition  (analytically  established)  concerning  the  four  rota- 
tions by  which  a  triply  orthogonal  frame  of  lines  can  be  dis- 
placed into  coincidence  with  a  similar  frame ;  by  the  use  of  a 
known  (Hamilton)  proposition  in  rotations,  he  gives  a  geo- 
metrical construction  for  the  displacement,  a  construction  which 
seems  almost  obvious — after  it  has  been  obtained.  Or  he  will 
proceed  to  abstract  space :  he  discusses  a  configuration  of  27 
hyperplanes  and  72  points  in  space  of  four  dimensions,  such 
that  six  of  the  planes  pass  through  each  point  and  sixteen  of  the 
points  lie  in  each  of  the  planes.  To  him  it  is  a  natural  extension 
of  the  customary  configuration  of  the  27  lines  on  an  ordinary 
cubic  surface  in  three  dimensions. 

Burnside's  investigations  in  elliptic  functions  compelled  him 
to  range  in  the  wider  field  of  the  theory  of  functions  in  general ; 
so  thither  he  had  proceeded  and,  in  his  progress,  he  became  an 
investigator. 

His  contributions  are,  as  ever,  varied  in  range.  Fifty  years 
ago,  it  was  a  surprise — to-day,  it  is  almost  a  commonplace — to 
learn  that  functions  of  real  variables  exist,  which  are  always 
finite,  are  always  continuous,  and  never  possess  a  determinate 
differential  coefficient :  the  now  classical  example,  due  to  Weier- 
strass,  is  that  of  the  series 

X  ^'^  cos  a»(9, 

n  =  0 

where  a  is  any  uneven  positive  integer,  and  6  is  a  real  positive 
quantity  such  that  a6  >  1  +  f  tt.  Burnside  made  a  step  in  advance 
(1894).  He  shewed  that  there  are  functions  of  real  variables, 
everywhere  finite,  everywhere  uniformly  convergent,  everywhere 
possessing  the  unrestricted  complement  of  successive  differential 
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coefficients,  yet  never  expansible  in  power-series;  and,  as  an 
illustration,  he  constructs  the  function 


2  A 


n=o  n\  1  +  a^  (x  -  tan  naf  * 
where  a  is  real  and  >  1,  and  where  a/tt  is  not  a  rational  fraction. 
His  proof  is  concise  and  demands  no  acquaintance  with  elaborate 
theory ;  as  usual,  it  leads  direct  to  a  definite  result  that  com- 
pletes the  investigation. 

On  another  occasion,  he  deals  with  the  Schwarz  solution  of  the 
problem  of  representing  a  closed  convex  polygon  in  one  plane 
conformally  upon  the  half  of  another  plane — a  result  that  has 
rendered  signal  service  in  mathematical  investigations  in  mat- 
ters so  diverse  as  heat,  hydrodynamics,  and  electricity.  In  these 
last  applications,  only  the  simplest  examples  are  used :  in  the 
general  Schwarz  solution,  an  Abelian  integral  occurs  the  use  of 
which  is  gravely  handicapped  by  its  multiplicity  of  periods; 
so  that  additional  conditions  become  necessary  to  render  the 
analysis  specific  in  application.  Burnside,  already  skilled  in 
polyhedral  functions  and  general  automorphic  functions,  in- 
vestigates the  aggregate  of  instances  where,  at  the  utmost, 
doubly-periodic  functions  will  suffice.  But  he  goes  on  to  deal 
with  multiply-connected  spaces  having  polygonal  boundaries : 
in  particular,  he  gives  the  solution  for  the  conformal  repre- 
sentation of  the  doubly-connected  area  which  lies  between  two 
concentric  similarly  placed  squares,  the  side  of  one  square  being 
double  that  of  the  other. 

He  seizes  upon  the  existence-theorem  which  establishes  the 
possibility  of  expressing  the  co-ordinates  of  a  point  on  an  algebraic 
curve  by  means  of  uniform  functions  that  are  automorphic  under 
sets  of  transformation.  The  lack  of  determination  of  the  group, 
appropriate  to  a  postulated  equation,  has  left  the  solution  as 
one  merely  of  existence  without  specific  determination.  Burnside, 
combining  his  knowledge  of  groups,  of  elliptic  functions,  and  of 
Klein's  icosahedral  functions,  gives  a  complete  specific  resolution 
of  the  problem  for  the  (apparently)  hyperelliptic  equation 

y^  =  x(oo*-l). 

It  is  unnecessary  to  accumulate  more  instances.    Burnside's 
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matured  development  flashed  out  in  his  double  paper  on  auto- 
morphic  functions,  published  in  1892.  The  subject  belonged 
to  a  new  section  of  mathematical  knowledge,  mainly  inaugurated 
by  Henri  Poincar^  and  systematically  expounded  in  a  series  of 
memoirs,  now  classical,  in  the  initial  volumes  of  Acta  Mathe- 
matica.  The  underlying  idea  is  simple.  Trigonometrical  functions 
are  singly-periodic :  that  is,  each  such  function  is  unchanged 
when  its  argument  suffers  an  increment  or  a  decrement  which 
is  any  integer  multiple  of  a  single  quantity.  Elliptic  functions 
are  doubly-periodic :  that  is,  each  such  function  is  unchanged 
when  its  argument  similarly  suffers  an  increment  or  a  decrement 
which  is  a  linear  combination  of  any  independent  integer  multi- 
ples of  two  quantities  (the  ratio  of  these  quantities  must  not  be 
real).  Jacobi  had  proved  long  ago  that  uniform  functions  of 
triple  periodicity  (and,  a  fortiori,  of  periodicity  higher  than  triple) 
in  a  single  variable  do  not  exist.  But  in  every  such  instance  the 
modification  of  the  argument  consists  solely  of  an  additive  in- 
crement or  decrement.  The  question  arises  :  What  is  the  most 
general  type  of  periodicity  for  a  function  of  one  argument?  And 
it  naturally  entails  the  further  question  :  What  are  the  functions 
possessing  that  type  of  periodicity?  Isolated  results  were  known, 
such  as  Jacobi's  elliptic  modular  functions  and  Klein's  polyhedral 
functions :  their  significance  as  examples  of  a  wider  theory  had 
not  appeared.  It  was  Poincar6  who  presented  the  first  general 
treatment  of  these  questions. 

Into  this  work  of  Poincare,  Burnside  plunged.  In  it  he  revelled. 
His  new  results  are  embodied  in  the  paper  on  automorphic 
functions  which  has  just  been  cited.  In  particular,  Poincare 
had  overstated  an  exclusive  central  result.  Burnside  detected 
the  overstatement  and  the  fundamental  cause;  and  he  devised 
a  new  class  of  automorphic  functions,  simpler  than  any  of  the 
classes  devised  by  Poincar^.  The  full  theory,  even  now,  remains 
to  be  established:  it  awaits  the  construction  (or  the  equivalent 
of  the  construction)  of  a  central  function  or  functions  which, 
while  palpably  automorphic,  shall  be  amenable  to  ordinary 
analytical  manipulation  as  are  the  corresponding  central  theta- 
functions  of  purely  incremental  periodicity.  When  the  history 
of  that  theory  comes  to  be  written,  Burnside's  name  will  hold 
an  honourable  place  in  the  record. 
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The  consideration  of  the  very  foundation  of  these  automor- 
phic  functions  led  Burnside  further  afield,  along  a  way  already 
opening  out  before  him  in  his  progress,  into  a  region  which  he 
explored  with  ample  discovery.    It  was  to  provide  the  most 
continuous  and   most  conspicuous  of  his  contributions  to  his 
science.  The  characteristic  property  of  every  automorphic  func- 
tion of  a  single  variable  is  that,  without  change  in  the  value  of 
the  function,  its  argument  is  subject  to  a  number  of  discrete 
operations,  which  are  independent  of  one  another,  are  capable 
of  unlimited  repetition  and  reversion,  and  admit  all  possible  com- 
binations, repetitions,  and  reversions,  in  unrestricted  sequence. 
The  aggregate  of  all   the  operations,  which   thus  emerge,  is 
termed  a  group,  so  that  a  function  can  be  automorphic  under 
a  group  of  transformations  (or  substitutions).  But  just  as  the 
properties  of  the  integers,  which  occur  in  the  arithmetic  of  any 
calculation,  merge  into  the  general   theory  of  number  which 
Ignores  all  specific  application,  so  the  properties  of  transforma- 
tions in  a  group  merge  into  a  more  comprehensive  calculus. 
That  calculus  deals  with  the  composition,  the  construction,  the 
resolution,  and  the  essential  properties,  of  a  group  regarded  as 
an  abstract  entity  whose  component  elements  are  subjected  to 
mathematical  laws  of  combination.  It  is  no  part  of  that  calculus 
to   take  account  of  possible  regions  of  application:   instances 
present  themselves  in  algebraic  equations,  in  analytic  functions, 
in  differential  equations,  in  divisions  of  space  of  different  orders 
of  dimension,  in  the  displacements  of  a  solid  body,  in  invariants 
and  CO  variants  of  all  kinds — a  selection  of  subjects  manifestly 
not  complete. 

The  earliest  expression  of  the  notion  and  its  initial  develop- 
ment are  due  to  Galois :  he  indicated  the  kind  of  relation  that 
could  exist  between  the  properties  of  an  algebraic  equation  and 
some  corresponding  group  of  finite  order.  The  early  growth  of 
the  theory  was  due  to  French  mathematicians,  Cauchy  in  par- 
ticular, then  Serret.  Somewhat  later  came  the  fine  exposition 
by  Jordan  who,  it  may  be  mentioned,  had  Klein  and  Lie  as 
pupils  at  the  outbreak  of  the  Franco- Prussian  war  in  1870. 
Down  to  that  date,  the  subject  revolved  round  algebraic  equa- 
tions as  a  centre. 
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The  interest  in  the  theory  began  to  spread.  The  next  real 
extension  was  due  to  Sylow,  in  a  memoir  on  groups  of  substitu- 
tions. Then  followed  a  partial  construction  of  its  mathematics 
as  a  pure  calculus,  without  regard  to  applications:  the  contri- 
butions of  Cayley  and  of  Weber  may  be  noted.  The  theory  soon 
divided  itself  into  two  co-ordinate  sections,  sometimes  advancing 
as  pure  calculus,  sometimes  extending  to  new  regions  of 
application.  A  theory  of  continuous  groups  branched  off  into 
complete  independence;  it  became  a  great  body  of  mathematical 
doctrine,  under  the  inspired  researches  of  Sophus  Lie  and  his 
disciples.  The  theory  of  discontinuous  groups  attracted  an 
equally  ardent  band  of  investigators:  the  names  of  Klein, 
Bumside,  Frobenius,  Holder,  and  Dyck,  recall  diverse  develop- 
ments in  theory  and  in  use. 

It  was  to  the  theory  of  discontinuous  groups  of  finite  order 
that  Burnside  mainly  devoted  his  attention.  Scattered  references 
to  such  groups  occur  in  some  of  his  papers  already  cited.  At  first, 
their  occurrence  seems  merely  incidental;  then  they  almost  prove 
that  his  thought  was  gradually  accumulating  the  evidences  of 
a  connected  theory.   From  the  early  nineties  onward  through 
much   of  the   remainder   of  his   life,  Burnside's   constructive 
thought  concentrated  on  the  subject.  Paper  after  paper  appeared 
from  him,  on  a  vast  variety  of  associated  topics,  in  ordered  de- 
velopment, each  providing  some  fresh  contribution,  all  of  them 
marked  by  imaginative  insight  and  compelling  power.   They 
found  their  first  culmination  in  his  book  on  the  Theory  of 
Groups,  published  in  1897.  That  volume  was  a  systematic  and 
continuous  exposition  of  the  pure  calculus  of  the  theory  as  it  then 
stood ;  and  it  embodied  the  researches  of  other  workers  in  Europe 
and  America  (always  with  ample  references)  as  well  as  his  own. 
His  papers  on  the  theory  of  groups  continued,  unhastingly,  un- 
restingly.    A  second  edition  of  the  book,  considerably  more 
extended  than  the  first,  appeared  in  1909.  Even  so,  his  activity 
in  the  subject  still  continued,  though  with  a  gradually  decreasing 
production.    He   published  over  fifty  separate  papers  on  this 
range  of  knowledge  alone;   each  of  them,  even  the  briefest, 
contained  some   definite  result  or  results  of  significance.    All 
this  work,  original   from   himself,  is  a  splendid  contribution 
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emanating  from  one  mind  and,  of  itself,  is  sufficient  to  secure 
the  remembrance  of  his  name. 

With  the  coming  of  the  war  in  1914  and  during  its  course, 
there  was  a  comparative  cessation  in  Burnside's  productivity. 
His  frame  was  almost  as  lithe  as  ever  and  apparently  as  full  of 
easy  spring,  as  though  to  belie  the  passage  of  years.  Some  of 
his  constructive  activity  passed  silently  into  the  service  of  his 
country  in  certain  naval  matters.  In  those  years  he  undoubtedly 
continued  to  produce  papers ;  but  the  main  body  of  his  work 
could  be  regarded  as  verging  towards  its  termination. 

One  new  subject,  however,  secured  some  regular  attention 
from  him,  even  amid  his  unbroken  interest  in  groups.  It  may 
have  originated  from  the  mathematics  of  some  war  problems, 
and  its  interest  may  have  been  fostered  as  he  pondered  over  the 
combinations  of  diverging  results  of  observations.  In  the  year 
1918  he  produced  a  short  paper  dealing  with  a  question  in 
probability,  purely  mathematical  as  propounded;  and  it  was 
followed,  from  time  to  time,  by  other  papers,  some  suggested  by 
practical  problems.  Probability,  as  a  mathematical  theory,  has 
not  yet  lent  itself  to  a  single  process  of  organised  development 
based  on  any  unique  set  of  ideas,  which  are  generally  accepted 
as  fundamental.  Even  the  method  of  almost  universal  use  in 
astronomical  observations  depends  upon  the  Gauss  assumption 
of  the  arithmetic  mean  of  a  number  of  discordant  observations, 
as  the  best  measure  of  the  unknown  quantity.  But  that 
assumption  stands  as  only  one  out  of  many  inferences  from  the 
less  arbitrary  assumption  that  the  probability  of  an  error,  in 
any  observation,  is  some  function  solely  of  the  deviation  from  the 
unknown  accurate  measure ;  with  that  less  arbitrary  assumption, 
a  more  general  inference  is  that  the  difference  between  the 
unknown  measure  and  the  arithmetic  mean  is  some  symmetric 
function  of  the  differences  between  the  observed  magnitudes. 
( Of  course,  the  occurrence  of  the  symmetric  function  modifies  the 
law  of  facility  of  error :  or  the  adoption  of  an  admissible  law,  not 
inconsistent  with  the  assumption  and  differing  from  the  expon- 
ential law,  determines  the  form  of  the  symmetric  function.) 
Burnside  deals  only  with  the  arithmetic  mean  :  thus  tacitly,  with 
other  writers,  making  the  symmetric  function  to  be  zero.   As 
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indicated  earlier,  he  did  not  consider  that  he  had  resolved  all 
his  difficulties.  Ever  a  severe  critic,  he  remained  critical  of 
himself;  he  was  not  afraid  to  modify  an  opinion;  he  did  not 
hesitate  to  abandon  an  opinion,  if  ever  he  regarded  it  as  not 
fully  tenable,  as  indeed  happened  in  fact.  The  manuscript, 
which  he  has  left  and  which  will  be  published  by  the  Cambridge 
University  Press*,  is  the  expression  of  his  views  so  far  as  they 
had  been  framed  into  a  system. 

There  is  one  activity  in  human  nature  which  exercises  a 
perennial  lure  for  living  minds.  When  a  worker  of  recognised 
distinction  in  any  field  has  completed  his  contribution  to  thought, 
some  survivors  delight  in  assigning  him  his  place  in  an  ordered 
hierarchy  of  memorable  names.  The  task  demands  an  easy 
omniscience  which  shall  gauge  all  knowledge  and  all  intellect,  if 
the  estimate  of  precedence  in  relative  merit  is  to  be  promulgated 
with  authority  and  received  with  belief.  Yet,  somehow,  such 
estimates  lack  the  quality  of  permanence.  Nearly  two  thousand 
years  ago  Lucretius,  the  brilliant  expositor  of  natural  philosophy 
in  an  age  of  culture,  described  Epicurus  as  a  man 

Qui  genus  humanum  ingenio  superavit, 

a  tribute  paid  two  full  centuries  after  the  death  of  the  Greek 
philosopher  of  the  atom  :  the  world  to-day,  if  it  ever  hears  of 
the  name  thus  lauded,  greets  the  judgment  with  a  smile.  Less 
confident  men  may,  in  their  own  day,  render  a  more  modest  yet 
equally  sincere  homage  to  a  passing  spirit,  from  their  reverence 
for  the  genius  that  has  striven  and  in  their  remembrance  of  the 
worldly  task  that  has  been  done.  Burnside,  during  a  life  of 
steadfast  devotion  to  his  science,  has  contributed  to  many  an 
issue.  In  one  of  the  most  abstract  domains  of  thought,  he  has 
systematised  and  amplified  its  range  so  that,  there,  his  work 
stands  as  a  landmark  in  the  widening  expanse  of  knowledge. 
Whatever  be  the  estimate  of  Burnside  made  by  posterity,  con- 
temporaries salute  him  as  a  Master  among  the  mathematicians 

of  his  own  generation. 

A.  R.  F. 

November  11,  1927. 

*  It  is  embodied  in  the  present  volume. 


THEORY  OF  PROBABILITY 


CHAPTER  I 

INTRODUCTION 

1.  The  words  "  probable  "  and  "  likely  "  continually  occur  in 
conversation,  as  also  does  the  substantive  "  probability  "  though 
not  so  frequently. 

"  The  glass  fell  a  lot  last  night,  it  will  probably  rain  today  " 
or  "  The  barometer  has  fallen  half-an-inch  since  yesterday,  there 
is  a  probability  of  rain  before  night  "  are  statements  such  as  we 
all  have  heard  at  the  breakfast  table.  The  hearer,  if  he  treats 
them  as  anything  more  than  attempts  at  starting  conversation, 
will  regard  them  as  more  or  less  vague  judgments  founded  on 
the  speaker's  previous  experience.  He  will  certainly  not  recognize 
any  numerical  precision  in  them. 

A  more  speculative  acquaintance  having  examined  the  baro- 
meter might  say,  "  I'll  bet  you  2  to  1  in  half-crowns  it  will  rain 
before  night " ;  to  which  the  answer  might  be,  "  No,  but  I'll  take 
3  to  1."  Here  both  speakers  do  apparently  make  some  rough 
kind  of  numerical  estimate  of  the  probability  of  its  raining 
before  night.  Their  estimates  however  apparently  do  not  agree, 
nor  would  an  audience  infer  that  either  speaker  attached  numeri- 
cal precision  to  his  estimate. 

Let  us  take  another  set  of  statements  involving  the  words 
"  likely,  probable,  probability."  The  captains  of  two  cricket 
teams  habitually  determine  the  choice  of  innings  by  spinning  a 
coin.  They  would  certainly  repudiate  the  suggestion  that  a  coin 
is  more  likely  to  fall  head  than  to  fall  tail.  They  would  assent 
to  the  statements  : — 

"  When  a  coin  is  spun,  it  is  equally  likely  to  fall  head  or  tail": 

"When  a  coin  is  spun,  the  probability  of  its  falling  head  is 
the  same  as  the  probability  of  its  falling  tail." 

Though  they  might  find  a  difficulty  in  explaining,  without 
using  the  words  "  likely,  probable,  probability,"  the  meaning  of 
these  statements  they  assent  to,  they  undoubtedly  act  upon 
them. 
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2  PRELIMINARY  EXPLANATIONS     [CHAP.  I 

In  the  above  sentences  between  inverted  commas,  the  words — 
probable — likely — probability — are  used  in  a  more  or  less  vague 
conversational  sense.  Now  when  the  calculation  of  probabilities 
is  spoken  of,  it  is  implied  that  the  probabilities  in  question 
are  capable  of  being  measured  or  specified  by  numbers.  About 
such  probabilities  there  can  be  nothing  vague.  It  may  very  well 
be  the  case  that  some  of  the  probabilities  of  ordinary  conversa- 
tion cannot  in  any  way  be  brought  under  the  head  of  calculable 
probabilities  ;  while  others,  by  making  suitable  assumptions,  can. 
For  instance,  probabilities  connected  with  the  question  of 
whether  it  will  rain  before  night  may  be  found  to  belong  to  the 
first  class,  while  those  connected  with  the  fall  of  a  coin  may 
belong  to  the  second. 

A  probability  can  only  be  said  to  be  measured  or  represented 
by  a  number,  when  a  rule  exists  by  means  of  which  the  number 
can  be  calculated  from  a  sufficiently  extensive  set  of  data.  Before 
stating  such  a  rule,  it  will  be  convenient  to  begin  with  ex- 
planations of  both  the  phraseology  and  the  notation  that  will 
be  used. 

When  making  a  trial  or  choice  is  spoken  of,  it  is  implied  that 
the  result  of  this  trial  or  choice  is  uncertain.  The  degree  of 
uncertainty  will  depend  upon  the  nature  of  the  trial  or  choice. 
A  somewhat  typical  case  is  the  choice  of  an  integer.  When  no 
condition  is  imposed  on  the  result  of  the  choice,  the  number  of 
results  is  clearly  unlimited.  When  an  integer  is  expressed  in 
the  scale  10,  the  sum  of  its  digits,  s,  and  the  number  of  its  zero 
digits,  z,  are  definite  numbers.  All  integers  may  be  divided  into 
two  classes :  those  for  which  s  does  not  exceed  20,  and  those  for 
which  s  does  exceed  20.  The  condition  that  "in  the  integer 
chosen,  s  does  not  exceed  20  "  is  a  limitation  on  the  trial  or 
choice,  for  it  cuts  out  a  number  of  what  were  possible  results. 
There  are  still  however  an  unlimited  number  of  results.  In  the 
same  way  the  condition  that  "  in  the  integer  chosen,  z  does  not 
exceed  10 "  is  a  limitation  on  the  choice ;  and  with  this  con- 
dition by  itself  satisfied  the  number  of  results  is  still  unlimited. 
If,  however,  both  conditions  are  satisfied,  the  number  of  results  is 
no  longer  unlimited.  It  is  clear  that  there  must  be  an  unlimited 
variety  of  sets  of  conditions,  such  that  when  those  of  a  single  set 
are  all  satisfied,  the  number  of  results  of  the  choice  is  finite. 


1]  PHRASEOLOGY  AND  NOTATION  3 

If  the  two  above  conditions  are  called  conditions  A  and  B,  then 
when  they  are  satisfied  there  is  a  finite  number  n^^  (=  n^^)  of 
results  of  the  choice. 

Now  consider  some  further  condition  G,  for  instance,  that 
"  the  leading  digit  in  the  integer  is  3."  The  number  of  results 
of  the  trial  when,  in  addition  to  conditions  A  and  B,  a  condition  G 
is  also  satisfied  will,  in  general,  be  less  than  n^^.  Denote  it  by 
rij^c-  I'^is  set  of  n^^^,  results  belong  to  the  w^^  results;  and  in 
those  of  the  n^^  results,  which  do  not  belong  to  the  n^^^,  the 
condition  G  is  not  satisfied.    If  then 

^abC  is  the  number  of  the  n^^  results  in  which  the  condition  G 
is  not  satisfied.  When  n^^f.  is  greater  than  unity,  it  will  clearly  be 
possible  to  divide  the  n^^g  results  in  which  conditions  A,  B  and 
G  are  ail  satisfied,  into  two  sets  by  means  of  a  fourth  condition 
D  which  is  satisfied  by  some  and  is  not  satisfied  by  the  rest.  In 
general  the  n^^(.'  results  will  also  be  divided  into  two  sets  by  the 
condition  D,  so  that 

where,  for  instance,  n^^f^'j)  is  the  number  of  integers  which 
satisfy  conditions  A,  B,  D,  and  do  not  satisfy  condition  G.  The 
order  in  which  the  letters  in  a  suffix  are  written  in  this  notation 
is  immaterial. 

If  any  one  of  the  numbers  n^^Qj)  is  greater  than  unity,  this 
process  may  be  continued  by  introducing  new  conditions. 

Finally  the  n^^  results  of  the  choice,  which  satisfy  conditions 
^and  j5,maybe  distinguished  from  each  other  by  a  finite  number 
of  other  conditions  G,  D,  E,  F,  ....  To  each  one  of  them  will 
correspond  a  symbol  GD  ...E'F'  ...,  implying  that  for  it  con- 
ditions G,  D,  ...  are  satisfied  and  conditions  E,  F,  ...  are  not 
satisfied. 

The  suffix  notation,  that  has  been  introduced  and  explained 
in  the  case  of  the  choice  of  an  integer,  is  quite  independent  of 
the  particular  case  to  which  it  has  been  applied.  In  the  pre- 
ceding illustration,  with  regard  to  conditions  A,  B  and  C  which 
have  been  stated  explicitly,  it  is  clear  that,  as  regards  each,  an 
integer  must  either  satisfy  it  or  not  satisfy  it.  There  are  no 
ambiguous  cases.    It  is  assumed,  once  for  all,  that  a  condition 

1-2 
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introduced  in  connection  with  other  trials  or  choices  is  such 
that  a  result  either  does  satisfy  it  or  does  not.  With  this 
assumption,  the  suffix  notation  may  clearly  be  extended  to  dis- 
tinguish between  the  results  of  any  trial  or  choice. 

Even  when  the  results  of  a  trial  are  subjected  to  no  conditions, 
their  number  may  be  limited  owing  to  the  nature  of  the  trial 
itself.  In  such  cases,  the  introduction  of  conditions  to  be  satisfied 
by  the  results  will  in  general  involve  a  further  limitation  of  the 
number  of  results. 

2.  With  these  explanations,  the  rule  for  calculating  calculable 
probabilities  may  now  be  stated. 

Rule.  The  results  of  a  trial  or  choice,  or  the  trial  itself,  or 
both  the  trial  and  the  results,  are  subject  to  such  conditions  that, 
wherever  whenever  and  by  whomever  the  trial  is  made,  there  are 
just  n  possible  results,  of  which  one  must  occur  and  only  one  can 
occur.  If  in  n^  of  these  results  the  condition  A  is  satisfied,  while 
in  the  remaining  ?i  —  n^  it  is  not  satisfied,  the  probability  that  the 
condition  A  is  satisfied,  when  a  trial  is  made,  is  njn ;  provided 
that  each  two  of  the  n  results  are  assumed  to  be  equally  likely. 

The  rule  on  which  the  calculation  of  probabilities  depends  has 
been  stated  in  a  variety  of  forms.  For  instance,  Poincar6  puts  it 
in  the  following  form  * : — 

La  probabilite  d'un  ev6nement  est  le  rapport  du  nombre 

des  cas  favorables  a  cet  6v6nement  au  nombre  total  des  cas 

possibles;   a  condition  que  tous  les  cases  soient  6galement 

vraisemblables. 

In  a  Note  (p.  101),  some  reasons  will  be  given  for  the  form 
chosen  here,  and  especially  for  the  way  in  which  the  assumption 
of  equal  likelihood  has  been  made.  The  number  rij^  is  some 
integer  from  0  to  n,  both  inclusive.  If  ?i^  is  neither  0  nor  n,  the 
results  of  the  trial  are  divided  into  two  sets  by  condition  A, 
namely  those  in  which  condition  A  is  satisfied  and  those  in  which 
it  is  not  satisfied;  but,  if  Uj^  is  either  zero  or  n,  this  is  uot  so. 
Condition  A  will  be  said  to  be  relevant  to  the  trial  in  the  first 
case  and  not  relevant  in  the  second. 

Suppose  now  that  condition  A  is  relevant  to  the  trial,  and 
consider  the  new  trial  which  is  subject  to  the  further  condition 
*  Galcul  des  Probabilites,  V^  ^d.,  pp.  1,  3;  2™«  6d.,  pp.  24,  26. 
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that  condition  B  is  satisfied.  This  new  trial  has  just  n^  possible 
results.  In  n^jg  of  these,  the  condition  B  is  satisfied;  and  in  the 
remaining  w^  —  n^^,  it  is  not  satisfied.  It  follows  from  the  rule  that 
the  probability  that  in  the  new  trial  condition  B  is  satisfied  is 
''^ab/'^a  j  since  each  two  of  the  n^  possible  results  have  already 
been  assumed  to  be  equally  likely.  In  the  same  way,  in  the  new 
trial  which  is  subject  to  the  condition  that  condition  A  is  not 
satisfied,  the  probability  that  condition  B  is  satisfied  is  n^'^/n^'. 
Hence  when  the  trial  is  made  and  the  condition  A  is  satisfied, 
the  probability  that  condition  B  is  satisfied  is  not  in  general  the 
same,  as  when  the  trial  is  made  and  condition  A  is  not  satisfied. 

If  however  condition  A  is  not  relevant  to  the  trial,  the  pro- 
bability, n^/?i,  that  the  condition  B  is  satisfied  does  not  depend 
on  whether  n^  is  zero  or  n  ;  i.e.  on  whether  A  is  satisfied  or  not. 

The  suffix  notation,  which  has  been  introduced  for  distinguish- 
ing between  the  results  of  a  trial,  will  also  be  used  in  representing 
probabilities.  Thus  pj^  will  denote  the  probability  that,  when  the 
trial  is  made,  condition  A  is  satisfied:  Pab'  {—Vba)  ^i^^  denote 
the  probability  that  condition  A  is  satisfied  and  condition  B  is 
not  satisfied,  and  so  on.  A  convenient  extension  of  this  notation 
is  to  use  P(s)A  ^^^  ^^^  probability  that,  when  condition  B  is 
satisfied,  condition  A  may  be  satisfied. 

Suppose  that  ^i,  A^,  ...,  4«_i  is  a  set  of  conditions  no  two 
of  which  are  both  satisfied  in  any  result  of  the  trial,  so  that  the 
n^^  results  in  which  Ai  is  satisfied  are  all  distinct  from  the  n^, 
results  in  which  Aj  is  satisfied.    Then 

or,  if  Ag  is  the  condition  that  no  one  of  the  s—1  conditions 
^1,  J.2,  ...,^8-1  is  satisfied, 

Hence,  since  p^  =  njn, 

'^=Pai+Pa,-^--'-^Pm  (iX 

where  A^,  A^,  ...,  ^«  is  a  set  of  conditions  of  which  one  must 
be  satisfied  and  only  one  can  be  satisfied  when  a  trial  is  made. 
In  particular, 

^^Va-^VA' 
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Again,  since  the  n^.g  results  in  which  conditions  Ai  and  B 
are  both  satisfied  are  distinct  from  the  n^.jg  results  in  which 
conditions  Aj  and  B  are  both  satisfied, 

so  that  PB=PAyB-^PA2B-^  ■•■-^Pa.b  (ii)- 

The  last  equation  but  one  may  be  written 

^AiB  ,     ^^AzB  ,  ,    '^AsB 

^A,  '  ^A,  '  ^U 

and  it  has  been  seen  that  ^^i^^^j  =  Pui)j8.    Hence 

PB=P(At)BPAr-^PiA2)BPA2-^----^P(As)BpAs i^^^)' 

3.  Among  the  conditions  that  are  not  relevant  to  the  result  of 
a  trial,  there  are  some  which  call  for  special  notice.  Consider  for 
instance  the  condition — that  the  trial  has  been  made  at  some 
other  time  or  in  some  other  place  than  those  in  question.  If 
this  were  relevant,  it  would  be  satisfied  by  some  and  not  satisfied 
by  other  results  of  the  trial  at  the  particular  time  and  place 
considered;  and  the  number  of  possible  results  at  the  time  and 
place  considered  would  be  less  than  n,  contrary  to  the  supposition 
in  the  rule.  In  precisely  the  same  way  it  follows  that  a  condition 
— that  another  trial,  whether  of  the  same  kind  or  not,  shall  have 
a  particular  result — is  also  not  relevant. 

If  a  trial  is  repeated,  and  it  is  proposed  to  consider  probabili- 
ties connected  with  the  repeated  trial,  it  is  necessary  to  make 
an  assumption  of  equal  likelihood.  Suppose  there  are  N  possible 
results  each  two  of  which  are  equally  likely  for  the  repeated 
trial,  and  that  in  N^  of  them  the  ith  result  occurs  at  the  first 
trial  and  the  ^'th  at  the  second.  For  the  repeated  trial,  subject  to 
the  condition  that  the  I'th  result  occurs  in  the  first,  there  are  just 

results;  and  each  two  of  them  are  equally  likely.  Now  it  has 
just  been  seen  that  the  result  of  the  first  trial  is  not  relevant 
to  the  second,  so  that  the  probability  that  the  ^'th  result  occurs 
at  the  second  trial  is 


for  all  values  of  L    From  this  it  follows  at  once  that 


3]  PROBABILITY  7 

or  in  words,  each  two  of  the  n^  results  of  the  repeated  trial, 
arising  by  combining  any  result  at  the  first  trial  with  any  result 
at  the  second,  are  equally  likely.  This  reasoning  may  clearly 
be  used  to  shew  that  when  the  trial  is  repeated  m  times,  each 
two  of  the  n^  results  are  equally  likely. 

If  attention  is  directed,  in  the  repeated  trial,  to  whether 
condition  A  is  satisfied  or  not,  the  N  results  may  be  divided 
into  four  sets  N^^,  ^aa'^  -^Aa^  -^a'a'  i^  number,  the  notation 
being  that  already  used.  Then  ^aaK^aa  +  -^.4^')  is  the  proba- 
bility that,  if  the  condition  A  is  satisfied  at  the  first  trial,  it  is 
satisfied  at  the  second.  It  has  been  seen  that  the  proviso  is  not 
relevant  to  the  result  of  the  second  trial.    Hence 

Similarly  _^J__  =^^, 

and  -^^^-^=Pa> 

the  last  equation  expressing  directly  that  the  probability,  that 
condition  A  is  satisfied  at  the  first  trial,  is  p^.  These  relations 
give 

^-pa\  ^=^=PAa-PA).  ^=(i-PAy- 

In  precisely  the  same  way  it  is  shewn  that  if  the  trial  is 
repeated  r-\-s  times,  the  probability,  that  at  r  specified  trials  in 
the  set  the  condition  A  is  satisfied  and  that  at  the  remaining  s 
trials  it  is  not  satisfied,  is 

p/a-PAY' 

If  a  second  trial  has  just  n  possible  results,  wherever  and 
whenever  it  is  made,  and  if  each  two  of  these  results  are 
assumed  to  be  equally  likely,  it  may  be  proposed  to  deal  with 
the  probabilities  regarding  the  results  of  the  two  trials  when 
performed  together.  As  in  the  previous  cases,  an  assumption  of 
equal  likelihood  must  be  made.  Let  there  be  iV  results  for  the 
double  trial  in  iVy,  of  which  the  ith  result  of  the  first  and  the 
jth.  result  of  the  second  occur.    Of  the  N  results,  there  are 

results  satisfying  the  condition  that  the  second  component  of 
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the  double  trial  has  its  jib.  result.  This  condition  is  not  relevant 
to  the  result  of  the  first  component  of  the  double  trial,  so  that 

AT,  •  =  ^  •  =       =  /Y  ■ 


for  each  value  of  j. 

It  is  similarly  s 

;hewn 

that 

iV^U=i^i2=... 

=  Ni„ 

) 

for  each  value  of  i. 

Hence 

N„_  1 

N       nn 

In  words,  each  two  of  the  nn  results  of  the  double  trials,  formed 
by  taking  any  result  of  the  first  component  with  any  result  of 
the  second,  are  equally  likely. 

This  result  also  may  clearly  be  extended  to  a  multiple  trial 
with  any  number  of  different  components. 

Still  using  the  same  suffix  notation,  {n^  +  f^A's)/'^  ^^  ^^^ 
probability  that  either  condition  A  or  condition  B,  but  not  both 
of  them,  may  be  satisfied.  This  may  be  rather  more  conveniently 
expressed  by  saying  that: — 

Probability  that  just  one  of  the  conditions  A  and  B  is  satisfied 

Similarly,  the  probability  that  at  least  one  of  the  two  conditions 
A  and  B  is  satisfied 

^Pab  +  Pab'  +  Pa'b- 
Now  Pa=Pab+Pab'>    Pb=Pab+Pa'B' 

Hence  the  expressions 

(a)    Pa-^Pb-^Pab^ 
(/3)    Pa-^Pb-Pab> 
give  the  probabilities  that  of  the  two  conditions  A  and  B, 
(a)  just  one,  (/3)  at  least  one,  is  satisfied. 

The  corresponding  formulae,  in  relation  to  a  number  of  con- 
ditions greater  than  two,  will  now  be  established. 

Let  Ai,  Azy  ...,  Anhe  n  distinct  conditions  each  of  which  is 
satisfied  or  is  not  satisfied,  when  a  trial  is  made. 

There  are  just  2"  symbols  such  as  pAiA2...AiA'i+i...A'n  ^^  which 
each  suffix  occurs  either  accented  or  unaccented.  It  is  a  result 
of  (ii)  above  that 

PAi  "^^  PAiA2...AiA'ii.i...A'n> 
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where  the  summation  extends  to  all  the  symbols  with  n  suffixes 
in  which  A^  is  unaccented.   Similarly, 

where  the  summation  extends  to  all  the  symbols  in  which  both 
Ai  and  A^  are  unaccented;  and  so  on. 

4.  A  typical  symbol  in  which  r  suffixes  are  unaccented,  and 
n  —  r  are  accented,  will  be  denoted  by 

Pr ,  n—r- 

Since  r  suffixes  can  be  chosen  from  the  n  in  (     )  or  — -; — '■ — r: 

\r/        rl(n—r)l 

ways,  pr^  n-r  stands  for  any  one  of  a  set  of  (     1  symbols,  a  particular 

one  being  specified  by  the  particular  set  of  r  suffixes  which  are 
accented. 

Similarly  particular  ones  of  the  symbols  p^^,  PaiAz*  PAiAiAs^  ••• 
will  be  denoted  hy  piyp^,p^, ...;  so  that  pg,  for  instance,  is  typical 

of  any  one  of  a  set  of  (    J  symbols. 

In  what  follows,  "^pr^n-r  denotes  the  sum  of  the  (  ^  J  symbols 

of  which  pr^  n-r  IS  the  type ;  and  2^9^.  denotes  the  sum  of  the  f    ) 
symbols  of  which  ps  is  the  type. 

If  now  "Epr  is  expressed  in  terms  of  the  2^  symbols  with  7i 
suffixes,  any  particular  symbol  ps,n-8  will  occur  in  the  sum  once 
for  each  distinct  set  of  ?•  unaccented  symbols  which  it  contains. 
Hence  if  s<r,  ps,n-8  will  not  occur  at  all,  while  if  s^r,  pg,n-r 

will  occur  (    J  times. 
It  follows  that 

this  relation  holding  for  all  values  of  r  from  1  to  n.    Hence 

T(-ir'{;)2p.=;rr(-i)-Q(:)2p.,„_. 


10  RELATIONS  CONNECTING  [CHAP.  I 

Now  2'(_ir<Q(^J=0,    .  +  «, 

Hence  Si),.„_,  =  T(- 1)-^  Q  S^^r (iv). 

Further,  t   lps,n-s  =  2     S  (-  l)-«  (  J  Ipr 

r  =  n  s  =  r  /^\ 

=  22  (-ly^nXpr; 

and  since  Te  l^'Q  =  (- 1)- (^ij)  , 

it  follows  that 

^2    Sp,,„-.=  ^S    (-  l)'-'(^_  [)  tpr     (V). 

Now  tpt,n-t  is  the  sum  of  the  Qj  symbols  jo^,^2....4*.i'«+i...^'n 

in  which  just  t  of  the  n  suflSxes  are  accented.  It  is  therefore  the 
probability  that  just  t  of  the  n  conditions  are  satisfied;  and 

s=n 

similarly  2  ^pg  „_»  is  the  probability  that  either  t  or  more  of 

the  n  conditions,  in  other  words  that  at  least  t  of  the  n  conditions, 
are  satisfied.  The  formulae  (iv)  and  (v)  therefore  give  the  ex- 
tension to  the  case  of  n  conditions  of  the  previous  formulae 
(a)  and  (/S)  for  the  case  of  two.  In  particular,  the  probability 
that  at  least  one  of  the  n  conditions  shall  be  satisfied  is 

"Zp,  -  Xi?2  +  %3  -  .-.  +  (-  ^T^'Pn\ 
and  therefore  the  probability  that  no  one  of  them  is  satisfied  is 

\-tp,Jr^p^-...+{-lTpn> 

5.    Returning  to  the  formulae 

Pa  =  Pab+Pab'>    Pb=Pab-^Pa'b, 
it  follows  that 

PaPb  =  Pab{Pab+  Pab'  +  Pa'b)  +  Pab'Pa'b 

=  Pab  (1  -Pa'b) -^Pab'Pa'b- 

Hence  if  the  condition 

PabPa'b- Pab' Pa'b  ='^ 
is  satisfied,  then 

Pab=PaPb>  Pab'  =  Pa  (1  'PbI 

Pab  =  (1  -Pa)Pb^    Pab'  =  (1  -Pa)  (1  -Pb)- 
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In  this  case,  the  conditions  A  and  B  are  generally  said  to  be 
independent. 

It  might  be  expected  that  when,  in  this  sense,  each  pair  of 
the  three  conditions  A,  B,  C  are  independent,  all  probabilities 
connected  with  them  could  be  expressed  in  terms  of  p^,  p^  and 
Pc'y  but  this  is  not  necessarily  the  case. 

Suppose  that 

Pab  =  PaPb>    Pac^PaPc^    Pbc=PbPc> 
and  that  at  the  same  time 

Pabc=PaPbPc-^^' 


Then 


Pabc 


Also 


Pabc' 


-Pa)PbPc-^> 

-Pb)PaPc-^> 

-Pc)PaPb-^- 

-Pb){^-Pc)-Pa'b'c'^ 

-Pa)('^-Pc)-Pa'b'C> 

-Pa){'^-Pb)-Pa'b'C' 


Pa'bg—Pbc~ 
Pab'c 
Pabc' 

Pab'C  —  Pb'C 
Pabc 
Pabc 
Entering  these  values  in 

Pabc  +  Pa'bc  +  Pab'c  +  Pabc  +  Pab'C  +  Pa'bc  +  Pabc  +  Pa'b'C  =  1 » 
it  follows  that 

Pa'b'C  =  (1  -i>^)  (1  -  j^^)  (1  -  Pc)  -  ^• 
Hence  when  A  and  B,  B  and  C,  G  and  A,  are  respectively  inde- 
pendent, all  the  probabilities  connected   with    them   can   be 
expressed  in  terms  of  Paj  Pb^  Pc  ^^^  another  number  k,  which 
may  have  any  value  between  —k^  and  A^a,  where  Ajj  is  the  greatest  of 

PaPbPo  Pa  (1  -Pb)  (1  -Pc\ 

Pb(^-Pa)('^-PcI    Pc{^-Pa)(^-Pb)> 
and  k^  is  the  least  of 

(1  -Pa)(^  -Pb)('^  -Pel     (1  -Pa)PbPc> 
(1  -Pb)PaPc,  (1  -Pc)PaPb- 

Also  Pabc  =  PbPc  -  Pabo 

Pab'C  =  (1  -i>B)  (1  -  Pc)  -  Pa'bc  » 
Pabc  +  PbPc  +PcPa  +  PaPb  -  ^Pabc 

+  S-2(p^+Ps  +  pc)  +PbPg-^-"-  Pa'b'C  +  Pa'b'C  =  1 . 

'2-2(pa ■\-Pb  +Pc)  +  2 (PbPc-^'")  -  ^Pabc-^Pa'b'C^^^ 

(1  -Pa)  (1  -Pb)  (1  -  Pc) -^  PaPbPc  -  Pabc  'Pa'b'C  =  0, 
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Pab'c'^C^  -Pb)(^  -Pc)-i'^  -Pa)('^  -Pb)(^  -Pc) 

-PaPbPc  +  Pabc 
^Pa(^-Pb)(^-Pc)'-PaPbPc+Pabc 
^Pabc-^Pa(^-Pb-Pc)' 
The  various  results  and  formulae  that  have  been  now  deduced 
from  the  rule,  especially  formula  (iii),  will  be  found  to  simplify- 
very  materially  the  calculation  of  probabilities  in  complicated 
cases.    In  particular,  they  enable  the  calculator  to  dispense  with 
a  continual  reference  to  the  rule  by  utilizing  probabilities  that 
have  already  been  determined.    But,  in  general,  the  real  diffi- 
culties of  calculation  are  connected  with  those  cases,  in  which 
the  number  denoted  by  n  is  great.   The  determination  of  p^ 
involves  the  distinguishing  and  picking  out  of  those  of  the  n 
results,  in  which  the  condition  A  is  satisfied.    That  it  may  be 
impossible  to  do  this  by  a  direct  enumeration  a  simple  example 
will  shew. 

Let  us  assume  that  when  a  coin  is  spun  it  is  equally  likely  to 
fall  head  or  tail.  Then  we  have  seen  that  when  the  coin  is  spun 
n  times  in  succession  each  two  of  the  2"  results,  as  regards  head 
and  tail,  are  equally  likely.  When  the  coin  is  spun  three  times, 
what  is  the  probability  that  there  will  be  a  sequence  of  at  least 
two  heads?  The  enumeration  is  quite  simple.  In  the  cases 
symbolized  by 

HHH,  HHT,   THE, 

and  in  no  others,  the  required  condition  is  satisfied.    Hence  the 
probability  is  |. 

When  the  coin  is  spun  100  times,  what  is  the  probability  that 
there  will  be  a  sequence  of  at  least  10  heads?  The  problem  is  of 
just  the  same  nature  as  the  preceding  one,  except  that  larger 
numbers  are  involved.  The  number  giving  the  possible  results 
contains  31  digits.  Assuming  that  an  inspection  lasting  one 
second  would  enable  one  to  say  whether  a  particular  result 
satisfied  the  condition  or  not,  it  would  take  over  3  x  10^*''  years 
to  complete  the  enumeration.  It  is  not  therefore  going  too  far 
to  say  that  in  this  case  a  direct  enumeration  for  ?i^  is  impossible. 
Some  indirect  method  must  be  used.  A  large  part  of  the  following 
chapters  will  be  devoted  to  these  indirect  methods  and  the  approxi- 
mate calculations  which  are  necessarily  connected  with  them. 
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Equal  Likelihood. 

6.  Before  going  on  to  this,  it  is  well  to  consider  shortly  the 
assumption  or  assumptions  of  equal  likelihood  that  must  be 
made,  if  probabilities  are  to  be  calculated  by  means  of  the  rule. 
It  is  to  be  noticed  that  two  calculators  making  the  same  formal 
assumptions  of  equal  likelihood  necessarily  obtain  the  same 
numerical  value  for  a  probability,  assuming  them  not  to  make 
mistakes  of  calculation.  The  resulting  value  in  no  way  depends 
on  the  meaning  that  either  calculator  may  attach  to  the 
assumptions,  nor  on  whether  or  no  the  assumptions  appear  to 
them  reasonable  assumptions.  So  far,  in  fact,  as  the  calculations 
go,  the  assumptions  are  purely  formal.  It  is  only  when  the 
calculated  probabilities  are  applied  to  questions  of  interest  out- 
side the  calculations  themselves  that  the  assumptions  cease  to 
be  merely  formal.  These  applications  are  continually  being  made. 
As  a  particular  instance,  a  good  deal  of  modern  molecular  physics 
is  bound  up  with  certain  calculated  probabilities.  When  such 
applications  are  made,  the  assumptions  of  equal  likelihood,  on 
which  the  calculations  are  made,  can  no  longer  be  regarded  as 
purely  formal.  They  become  in  fact,  directly  or  indirectly,  as- 
sumptions about  physical  phenomena;  and  the  question  of 
whether  the  assumptions  are  reasonable  becomes  at  once  of 
fundamental  importance. 

It  is  quite  obvious  that  two  different  assumptions  of  equal 
likelihood  will,  in  general,  lead  to  different  values  of  the  calculated 
probabilities.  It  has  been  seen  that  the  probability  of  a  sequence 
of  at  least  two  heads,  when  a  coin  is  spun  three  times,  is  f ;  the 
assumption  having  been  made  that  at  a  single  spin  head  and  tail 
are  equally  likely.  No  surprise  would  be  felt  at  getting  a  number 
other  than  |  had  a  different  assumption  been  made  as  regards 
the  result  of  a  single  spin.  The  apparently  paradoxical  result  of 
getting  two  different  values  for  the  same  probability  is  always 
to  be  explained  in  this  way.  One  of  the  most  noted  of  these  is 
due  to  M.  Bertrand.    He  solves*  the  question : — 

What  is  the  probability  that  a  chord  of  a  given  circle 
drawn  at  random  is  greater  than  the  side  of  the  inscribed 
equilateral  triangle  ? 

*  Calcul  des  Probabilites,  p.  4;  the  question  is  asked  for  'smaller  than '  and 
is  solved  for  'greater  than.' 
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He  carries  out  the  calculation  in  three  different  ways  and 
arrives  at  the  results  ^,  J,  J,  respectively.  An  analysis  of  the 
three  calculations  shews  that  the  assumptions  of  equal  likelihood 
are  not  the  same  in  the  three  ways.  This  fact  however  is  masked 
by  the  comparative  complication  of  the  assumptions;  hence  the 
apparent  paradox. 

Another  possibility  that  may  be  mentioned  here  is  that  the 
data,  from  which  it  is  proposed  to  calculate  a  probability,  are 
insufficient  whatever  assumption  of  equal  likelihood  is  made.  A 
well-known  example  of  this  is  given  by  the  following  question 
due  *  to  Professor  Boole  : — 

The  a  priori  probabilities  of  two  causes  A^  and  A 2  are  Ci 
and  C2  respectively.  The  probability  that,  if  cause  Ai  occurs, 
an  event  E  will  accompany  it  (whether  as  consequence  of  ^1 
or  not)  is^i;  and  the  probability  that  E  will  accompany  A^, 
when  A2  occurs,  is  p^.  The  event  E  cannot  happen  in  the 
absence  of  both  causes  A^  and  A2.  What  is  the  probability  of 
the  event  El 

With  the  notation  used  in  this  chapter,  there  are  eight  possi- 
bilities to  be  taken  into  account,  denoted  by 

A,A,E,    A,A,'E,    AM2E,    a;a;e, 
aU^e',   a,a;e\   a;a^e\   aia^e'. 

In  any  case, 

Ve  —  PAiAiE  +  PAiAz'E  +  PAi'AzE  +  PAi'A2'E 
—  PAiE  +  Pa^E  ""  PA1A2E  +  pAiAz'E  • 

The  data  give 

PaiE  =  CiPi .     Pa2E  =  (^2P2 ,    Pa{A2'e  =  0, 

so  that  Ps  =  Cijpi  +  C2P2  -  l^AiAiE- 

Since  the  data  give  no  information  at  all  about  the  simul- 
taneous occurrence  of  the  causes  A^  and  A^,  nothing  is  known 
about  PaiAzE'  other  than  the  necessary  relations  that  it  is  equal 
to  or  less  than  both  p^^^  and  Pa^e-  Hence  the  data  are  in- 
sufficient to  determine  the  probability  of  E.  It  is  remarkable 
that  Prof  Boole  himself  and  other  distinguished  mathematicians 
arrived  at  definite  values  of  p^,  which  did  not  agree  with  each 
other. 

*  Laivs  of  Thought,  p.  321. 


CHAPTER  II 
DIRECT   CALCULATION   OF   PROBABILITIES 

7.  The  methods  and  formulae  of  the  previous  chapter  will  now 
be  illustrated  by  considering  a  number  of  particular  examples.  In 
each  set  of  cases  the  assumption  of  equal  likelihood  will  be 
stated  explicitly.  The  words  "chance"  and  "probability"  will 
be  used  indifferently  as  being  equivalent  to  each  other. 

The  first  set  of  illustrations  will  be  drawn  from  the  game  of 
bridge.  There  are  521/39!  13!  ways  in  which  a  set  of  13  cards 
can  be  taken  from  a  pack  of  52.  It  will  be  assumed  that  each 
two  of  these  are  equally  likely. 

I.  What  is  the  chance  of  holding  at  least  one  long  suit  (i.e. 
a  suit  of  five  or  more)  at  bridge  ? 

In  a  hand  which  does  not  contain  a  long  suit,  the  distribution 
of  the  13  cards  in  four  suits  must  be  according  to  one  of  the 
schemes 
""  (i)  4,  3, 3, 3 ;     (ii)' 4,  4,  3,  2 ;     (iii)  4,  4,  4, 1. 

The  number  of  sets  of  13  cards,  of  which  4  are  hearts, 
3  diamonds,  3  clubs  and  3  spades,  is 

13!   /   13! 


9!4ni0!3!. 

The  suit  of  4  cards  may  be  any  one  of  the  four  suits.    Hence 
the  number  of  hands  corresponding  to  the  first  scheme  is 

13!   /   13!   y 
9!  4!U0!  3!/  * 
The  number  of  ways,  in  which  4  hearts,  4  diamonds,  3  clubs 
and  2  spades  may  be  chosen,  is 

13! \-    13!        13! 


V9!4!>' 


10!  3!*11!  2!* 

There  are  six  ways  of  choosing  the  two  4-suits,  and  then  the 
other  two  may  be  taken  in  two  ways.  Hence  the  number  of  hands 
corresponding  to  the  second  scheme  is 

^~  /  13!  V    13!        13 


\9 !  4 !; 


9!4!y  10!  3!*11!  2! 
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Similarly  it  may  be  shewn  that  the  number  of  hands  corre- 
sponding to  the  third  scheme  is 

Hence  the  chance  of  not  holding  a  long  suit  is 

,     13!  /  13!  Y     19  /  13!  y    13!       13!       ./13!Y,o 
\)  ''^^\W]A\)  10!3!TrT2!  +  ^V9!T!y  '^^ 


9!4!V10!3!> 


52: 


39! 13! 
After  reduction   this  is  found  to  be  '351,  to  three  places. 
Hence  the  chance  of  holding  a  long  suit  is  '649. 
:^    II.    What  is  the  chance  that  one's  hand  at  bridsfe  shall  contain 
just  one  card  ot  some  suit  f 

A  single  heart  may  be  chosen  in  13  ways;  and  the  remaining 
12  cards  may  be  chosen  from  diamonds,  clubs  and  spades  in 
39!/27!12!  ways.  The  same  applies  to  a  single  card  of  any 
other  suit.    Hence  the  required  chance  is  . 

39!     SOf^TRAoT      (OP 


4.13 


27!12!       (^fs.    '^otVy 


52. 


39! 13! 
On  reduction  this  is  found  to  be  "320,  to  three  places. 

III.  The  number  of  ways,  in  which  a  hand  at  bridge  can  be 
chosen  to  hold  n  (<  13)  assigned  cards,  is  the  number  of  ways  in 
which  13  —  71  cards  can  be  selected  from  52  — n.  Hence  the 
probability  that  a  hand  at  bridge  contains  n  assigned  cards  is 

13!        (52-n)! 
(13-/1)!*      52!      ' 
For  71  =  1,2,  3,  4,  this  gives  J,  ^V^  iwo^  Aih- 

IV.  What  is  the  chance  in  a  hand  at  bridge  of  holding  (i)  at 
least  one  ace,  (ii)  just  one  ace,  (iii)  at  least  two  aces? 

From  the  previous  case  the  chances  of  holding  one,  two,  or 
three  assigned  aces,  or  all  four,  are 

1  1  _U  11 

4'       TT'       85iy»       4  16  5' 

There  are  six  sets  of  two  assigned  aces,  and  four  sets  of  three. 
Hence,  with  the  notation  of  (iv)  and  (v),  p.  10, 


r    5  H  E 


ff<0  dlL  ijl^'^  X;.        B^iUt^       P,)^ 
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It  follows  that  the  probability  of  holding  at  least  one  ace 
=  1  - ^  +  ^ -  ^1^-^  =  -694,  to  three  places; 
the  probability  of  holding  just  one  ace 

=  1-2. /^ +3. ^-4.  4^^  =  -434,  to  three  places ; 
the  probability  of  holding  at  least  two  aces 

=  ^  -  2  .  g%4_  4. 3  .  _ii^  =  -255^  to  three  places. 

As  already  stated,  the  assumption  underlying  the  above 
calculations  is  that,  when  13  cards  are  chosen  from  a  pack  of  52, 
each  two  sets  are  equally  likely  to  be  chosen.  The  question 
clearly  arises  as  to  whether  this  assumption  is  a  necessary 
consequence  of  the  assumption  that,  when  one  card  is  chosen 
from  a  pack  of  52,  each  two  cards  are  equally  likely  to  be  chosen. 

Let  us  make  the  assumption  that,  when  one  object  is  chosen 
from  a  set  of  n,  each  two  objects  are  equally  likely  to  be  chosen; 
and  let  us  subject  the  choice  to  the  condition  that  one  particular 
object,  say  the  ith,  is  not  chosen.  Then  in  the  restricted  choice, 
subject  to  this  condition,  there  are  just  n  —  1  possible  results;  and 
by  the  assumption  already  made,  each  two  of  these  are  equally 

likely.    Hence  the  probability  of  any  one  of  them  is . 

Now  if  it  is  proposed  to  deal  with  the  probabilities  connected 

with  drawing  two  objects  simultaneously  from  the  set  of  n,  an 

assumption  of  equal  likelihood  must  be  made.  Suppose  there  are 

just  N  results  each  two  of  which  are  equally  likely,  and  that  in 

Nij  of  these  the  ith.  and  jth  objects  are.^chosen  (Nij==Nji,  Nu  =  0). 

Imposing  the  further  condition  that  one  of  the  chosen  objects 

is  the  ith,  there  are  2iVy-  possible  results  of  the  restricted  trial; 

J 
and  each  two  of  these  are  equally  likely  by  the   assumption 

already  made.    Hence  the  probability  that  the  other  object 

chosen  is  the  jth  is  Nijj^Nij,  and 

3 

Nij  _     1 


'^Nij     n-1' 


This  being  true  for  all  values  of  i  and  j,  it  follows  that 


N^_        2 


N      n{n-l)' 

FB 
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In  a  similar  way  it  may  be  shewn  that,  when  a  set  of  r  objects 
are  chosen  from  n,  the  equal  likelihood  of  each  two  sets  of  r 
follows  as  a  consequence  of  the  assumption  that,  when  one  object 
is  chosen,  each  two  are  equally  likely  to  be  chosen. 

V.  A  box  contains  a  white  and  b  black  balls,  and  N(=p  +  q) 
balls  are  simultaneously  drawn  from  it.  If  p  ;^  a,  q^b,  what  is 
the  chance  that  p  white  and  q  black  are  drawn  ?  When  N  is 
given,  for  what  value  of  p  is  this  chance  as  great  as  possible  ? 

From  a  +  6  balls,  N  may  be  drawn  in  (a  +  b)l/N\  (a  4-  6  - iV)! 
ways;  and  by  assumption  each  two  of  these  are  equally  likely. 
From  a  balls,  p  may  be  drawn  in  al/p\  (a^p)\  ways;  and  each 
of  these  may  be  combined  with  the  6!/^'!  {h  —  q)\  ways  in  which 
q  may  be  drawn  from  b.    Hence  the  required  chance  is 

albljSr\(a  +  b-N)l 
(a  +  b)l  pi  q\  (a  - p)\  (b  ~  q)\' 

This  will  be  greater  than  the  chance  for  p  +  1  white  and  q  —  1 
black  or  for  />  —  1  white  and  ^  +  1  black,  if 

q(a-p)<(p  +  l){b-q-i-l\ 
and  p(b-q)<(q-\-l)(a-p-{-l). 

Moreover,  if  the  first  of  these  inequalities  holds,  so  also  does 
the  one  derived  from  it  by  increasing  p  and  diminishing  q  by 
the  same  number,  and  a  similar  statement  is  true  of  the  second. 
The  chance  then  is  as  great  as  possible  when 

p+1      a+1         p 

~Y~~^bTi^r^' 
i>  +  i>(i\r  +  i)-^-^-^>^. 

The  chance  is  therefore  greatest  when  p  is  the  greatest 
integer  in  (iV+l)(a+ l)/(a  +  6  + 2). 

VI.  There  are  n  boxes  of  which  the  ith.  contains  a^  white 
objects  and  bi  black  objects.  One  of  the  boxes  is  chosen  and 
from  it  an  object  is  drawn.  What  is  the  chance  that  it  is  white? 

If  we  denote  by  Ai  the  condition  that  the  ith  box  is  chosen, 
and  by  B  the  condition  that  a  white  object  is  drawn,  then  by 
formula  (iii),  p.  6, 

n 

Pb=    ^   VAiViAi)B' 
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On  the  assumptions,  that  each  box  is  equally  likely  to  be 
chosen  and  that  from  any  box  each  object  is  equally  likely  to 
be  drawn, 

and  ps  =  -  z 


n  i  ai  +  bi' 

VII.  With  the  conditions  of  the  previous  case,  what  is  the 
probability  that  N  consecutive  drawings  give  white  objects,  each 
object  drawn  being  replaced  before  the  next  drawing:  (i)  when 
a  box  is  chosen  before  each  drawing,  (ii)  when  the  box  chosen 
for  the  first  drawing  is  used  throughout  ? 

in  the  first  case   the  probability  of  a  white  object  at  each 
drawing  is 

n  ' 

so  that  (Chap,  i)  the  probability  of  iV  consecutive  white  objects  is 

%\^ 
n  ) 

If  the  drawings  are  all  made  from  the  ith  box,  the  probability 
of  N  consecutive  white  objects  is  pi^ ;  and  therefore,  by  formula 
(iii),  p.  6,  the  required  probability  in  the  second  case  is 

n 

It  follows,  fi*om  a  well-known  inequality,  that  the  probability 
in  the  second  case  (if  iV  >  1)  is  always  greater  than  that  in  the 
first. 

VIII.  What  is  the  chance  that  an  integer  chosen  from  the 
first  N  integers  is  divisible  by  at  least  two  of  the  four  primes 
2,3,5,7? 

Of  the  integers  from  1  to  N,  the  number  that  are  divisible 
by  a  prime  p  is  the  integral  part  of  Njp,  which  is  represented 
by  \_NIp\    Hence  if,  when  an  integer  is  chosen  fi'om  the  first  N, 

2-2 
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each  two  are  equally  likely  to  be  chosen,  the  chance  that  it  is 
divisible  by  p  is 

N    ' 

Unless  N  is  divisible  by  p,  this  is  less  than  Ijp,  but  its 
difference  from  Ijp  is  less  than  IjN,  If  N  is  suflSciently  large 
the  chance  is  very  nearly  Ijp.  If  q  is  another  prime,  the  chance 
that  the  number  chosen  is  divisible  by  both  p  and  q  is 

N      ' 

and  when  N  is  large  enough,  this  is  sensibly  1/p^.  This  reasoning 
may  be  repeated  for  more  than  two  primes. 

With  the  notation  (p.  10)  of  (iv)  and  (v),  the  relations 

V     _J^        11111 


2.3      2.5  '  2.7      3.5      3.7      5.7' 
T  1  1  1  1 


S^4  = 


2.3.5  ■  2.3.7      2.5.7  '  3.5.7' 

1 

2.3.5.7' 

are  approximately  true  when  N  is  large  enough,  the  errors 
approaching  zero  as  N  increases.    Now  with  these  values 

Hence,  when  N  is  large  enough,  the  chance  that  an  integer, 
chosen  from  the  first  N  integers,  is  divisible  by  at  least  two  of 
the  first  four  primes  is  sensibly  ^. 

In  the  same  way  it  may  be  shewn  that  the  probability  of 
divisibility  by  at  least  two  of  the  first  10  primes  is  '48  to  two 
places;  or  by  at  least  two  of  the  first  20  primes  is  "55.  It  should 
be  noted  that  as  the  number  of  primes  considered  increases,  so 
also  must  the  number  N  to  ensure  reasonable  accuracy  in  the 
inference. 

The  general  statement  is  that,  when  iV^  is  large  enough,  the 
chance  that  an  integer  chosen  from  the  first  N  is  divisible  by 
at  least  two  of  the  set  of  distinct  primes  pi,p2,  "',Pn  is 

p,-l       p^-l  Pn-lJ\         pj\         pj         \         pj 
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IX.  There  are  n  letters  and  n  corresponding  envelopes,  and 
one  letter  is  put  into  each  envelope.  This  can  be  done  in  n\ 
ways.  It  is  assumed  that  each  two  of  these  distributions  are 
equally  likely.  What  is  the  probability  (i)  that  just  r  letters  go 
into  their  corresponding  envelopes,  (ii)  that  no  letter  goes  into 
its  corresponding  envelope? 

n!  ... 

There  are  ■.  ways  m  which  a  given  set  of  r  letters  can 

be  put  into  r  out  of  n  envelopes:  and  in  just  one  of  these  ways 
will  each  of  the  r  letters  go  into  its  corresponding  envelope,  so 
that  with  the  notation  of  (iv)  and  (v) 


Pr  = 


(n  —  r)\ 


ni 


Also  r  letters  can  be  chosen  out  of  n  in  -rz — '■ — r?  ways,  so  that 

r\(7i  —  r)l       "^ 

The  probability,  that  just  r  letters  go  into  their  corresponding 
envelopes,  is 

Zpr-(r  +  l)  Ipr+i  + ^ ^;)^+2  -  . . . 

{n  —  r)l 

=  — I1-I  +  — +         -\-(-l)n-r         ^ 


r\  [  1.2      1.2.3 ^       '      (n-r)!!  • 


e 


— 1 


Unless  n  —  r  is  quite  small,  this  is  very  nearly  the  same  as 

The  probability,  that  at  least  one  letter  goes  into  its  corre- 
sponding envelope,  is 

2!^3!         ^^     ^      nl' 

It  follows  that,  unless  n  is  quite  small,  the  probability  that  no 
letter  goes  right  is  sensibly  e~\  The  numerical  value  is  '368,  to 
three  places. 


22  EXAMPLES  OF  [CHAP.  II 

X.  A  line  of  unit  length  is  divided  into  M  equal  parts ;  and 
it  is  assumed  that  when  a  point  is  marked  on  the  line  it  is  as 
likely  to  be  in  any  one  part  as  in  any  other.  N  points  are 
marked  on  the  line.  What  is  the  chance  that  they  all  lie  on 
a  segment  of  the  line  made  up  of  r  consecutive  parts  ? 

There  are  M  —  r  -hi  segments  of  the  line  made  up  of  r  con- 
secutive parts.  That  segment,  which  starts  from  the  ^th  division 
reckoning  from  the  left-hand  end,  will  be  called  the  ^th  segment. 
Denoting  the  chance  that  all  iV  points  lie  on  the  part  of  the 
line  common  to  the  ith,  jih,  ...  segments  by  ^y...,  the  required 
chance  q  is 

the  sums  applying  to  all  sets  of  1,  2,  3,  ...  segments  respectively  ; 
for  this  is  the  chance  that  there  is  at  least  one  segment  which 
contains  all  N  points. 

Now  if  i  and  j  are  not  consecutive  numbers,  the  part  of  the  line 
common  to  the  ith.  and  the^'th  segments,  is  also  common  to  the 
ith, ^*th,  A;th,  Zth,  ...  segments,  where  k,  I,  ...  are  any  numbers 
lying  between  i  and  j.  If  J  >  i,  just  r  -\-  i—j  parts  are  common 
to  the  ith  and  ^*th  segments,  so  long  as  this  number  is  not 
negative.    Hence,  if  r  +  i  >j  >  i, 


Pij'^y — y-^j  -=^Pikj=Pikij 


where  k,  I,  ...  are  any  numbers  lying  between  i  and  j;  and 
if  j  >r-{-i, 

0  =  pij  =  pikj  =  Pikij  =  . . .  . 

If  j  =  1  +  5  +  1,  p^j  occurs  once  in  Spy,  s  times  in  %pijk, 
^s  (s  —  1)  times  in  ^pijki,  and  so  on,  when  the  above  equalities 
are  taken  account  of  Hence  if  s  >  0,  pij  will  occur  in  the 
expression  for  q  with  zero  as  coefficient,  so  that 

q  =  l.pi  -  Ipi^  i+i 

The  probability,  that  the  iV^  points  are   in  some  segment  of 
r  parts  and  in  no  smaller  segment,  is 

2jpi  -  2l.pij  +  Stpijk  -  •■■' 
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Ifj  =  14-54-1,  the  coefficient  of  py  is 

«      «        A  s  (s  —  1) 
2-35  +  4-^-^ — -^-.... 

This  is  1,  if  s  =  1 ;  and  0,  if  s  >  1. 

Hence  the  probability  q\  that  there  is  just  one  segment  of 
r  parts  in  which  all  iV  points  lie,  is 


q'  =  (M^r^l)[^J-2(M-r){^-^ 

If,  in  the  formulae  for  q  and  q\  we  write 

r  _ 

then 

^= M ■••' 

where  the  unwritten  terms  contain  M~^  and  higher  negative 
powers.  When  M  is  very  large,  the  unwritten  terms  will  be 
very  small.  Hence  the  probability,  that  there  is  some  continuous 
segment  of  the  line,  of  length  x,  which  contains  all  N  points, 
is  very  nearly 

and  the  probability,  that  the  distance  between  the  two  extreme 
points  lies  between  x  and  ^  +  -)[>,  is  very  nearly 

N{N-l-(N-^)x]x^-^ 
M 

XI.  The  conditions  being  as  in  the  previous  example,  what 
is  the  probability  that,  N  being  greater  than  M,  at  least  one  of 
the  divisions  shall  contain  no  point  ? 

If  qi  is  the  chance  that  the  ith  division  contains  no  point, 
qij  the  chance  that  neither  the  ith  nor  the  jth  division  contains 
a  point,  and  so  on, 
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and  the  required  chance  is 

q  =  tqi  -  l^qij  +  Sqijk  -  ... , 
so  that 

For  quite  small  values  of  M  and  iV,  the  numerical  value  of  q 
may  be  calculated  from  this  formula ;  but  clearly  some  method 
of  approximation  must  be  used  when  M  and  N  are  large. 

Suppose,  for  instance,  that 

N=M(\ogM+k), 
where  M  is  large  and  A;/log  M  is  small  and  positive.    Then 

SO  that 

r-  log  M+k 


-r)\  \ 


r\{M-r)\\        Mj  r\   '  M' {M  -  r) 

For  any  given  value  of  r,  the  factor 


r*  logiV+A; 


M''{M  -r)l 
rapidly  approaches  unity  as  M  increases,  while 


^-Icr 


r . 

rapidly  approaches  zero  as  r  increases.    Hence  for  the  value  of 
N  assumed, 


00  Q—kr 


\  —  q  =  ^  — -  very  nearly 
=  e~^     very  nearly 


xV 


-Me   ^'^  , 

=  e  nearly. 

As  a  particular  case,  if  N  is  nearly  equal  to 

3/ (log  il/- log  log  2), 

the  probability  that  every  division   contains  a  point  is   very 
nearly  J. 
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XII.  A  coin  is  spun  n  times.  The  probability  of  its  shewing 
head  at  the  first  spin  is  p' ;  while  at  any  subsequent  spin  the 
probability,  that  the  coin  shews  the  same  face  as  at  the  previous 
spin,  is  p.  What  is  the  probability  that  the  coin  shews  head  at 
the  71  th  spin  ? 

If  the  probability  is  pn,  then,  by  applying  the  formula  (iii)  on 
p.  6  to  the  nth  spin. 

Pn  =ppn-i  +  (1  -P)  (1  -Pn-i), 

Similarly  pn-i  =  {2p-l)  pn-2  +  1  -  /), 


while,  for  the  second  spin, 

p.,  =  (2p-l)p'  -^  1  -p. 
Hence 

+  [1  +  {2p- 1)  -\-(2p  -  ly  -^ ...  +  (2p-iy-'](i  -p) 
=  ^+{2p-ir-'{p'-^). 

Unless  p  is  either  very  small  or  very  nearly  1,  the  required 
probability  is  very  nearly  J  after  quite  a  moderate  number 
of  spins. 


CHAPTER  III 

INDIRECT  METHODS  OF  CALCULATING 
PROBABILITIES 

8.  If  p  is  the  probability  of  an  event  on  a  single  trial,  then 
p^  is  the  probability  that  it  will  happen  at  each  of  n  consecutive 
trials ;  and  therefore  1  —  ^^  is  the  probability  that  it  will  fail  to 
happen  at  least  once  in  n  consecutive  trials.  Hence,  in  N  sets 
of  n  consecutive  trials,  the  probability  that  the  event  will  fail  to 
happen  at  least  once  in  each  set  is  (1  —p^)^.  This  probability 
clearly  approaches  zero  as  iV  increases.  Therefore  1  —  (1  —  p**)^, 
which  is  the  probability  that  in  one  at  least  of  the  iV  sets  of 
n  consecutive  trials  the  event  happens  every  time,  approaches 
unity  as  N  increases. 

The  result  may  be  expressed  in  a  slightly  different  way, 
since  the  N  sets  of  n  consecutive  trials  constitute  a  set  of 
Nn  consecutive  trials.  Thus,  in  Nn  consecutive  trials,  the  pro- 
bability that  the  event  happens  n  times  consecutively,  once 
or  more,  the  sequences  consisting  either  of  the  first  n,  or  the 
second  n,  ...,  or  the  iV^th  n,  is  1  —  {1—p^)^.  The  probability 
that,  at  some  stage  of  the  Nn  trials,  the  event  will  happen  n  or 
more  times  consecutively,  is  clearly  greater  than  1  —(1  —  p^)^. 
To  take  a  particular  case,  the  probability  that  in  7200  trials 
a  spun  coin  will  fall  head  10  or  more  times  consecutively  is 

/         1  \''2o  .      . 
greater  than  1  —  ( 1  —  —J    ,  i.e.  is  greater  than  J,  assuming  head 

and  tail  equally  likely. 

It  should  be  noted  that,  no  matter  how  small  p  and  how  great 
n  may  be,  1  —(1  —p'^)^  will  differ  very  little  from  unity  when 
N  is  large  enough. 

Though  the  above  process  gives  a  lower  limit  to  the  probability 
required,  the  limit  is  clearly  much  too  small,  and  the  true  value 
must  be  arrived  at  in  some  other  way.  In  this  and  a  number  of 
similar  questions,  which  are  generally  classed  under  the  head 
of  "duration  of  play,"  the  required  probability  can  often  be 
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determined  by  obtainiDg  a  finite  difference  equation  which  it 
must  satisfy. 

Let  p  be  the  probability  of  an  event  at  a  single  trial.  Denote 
by  u^  the  probability  that,  by  or  before  the  Mth.  trial,  the 
event  has  happened  n  or  more  times  consecutively ;  and  by  Vj^ 
the  probability  that,  at  the  3/th  trial,  the  event  has  happened 
n  times  consecutively  without  having  done  so  before  the  if th 
trial.    Then 

In  order  that  the  Mth.  trial  may  complete  the  first  set  of  n 
consecutive  happenings,  the  following  conditions  must  be  satisfied : 

(i)  It  must  not  have  happened  n  or  more  times  con- 
secutively, up  to  and  including  the  {M  —  n  —  l)th  trial ; 

(ii)    It  must  not  happen  at  the  {M  —  n)th.  trial ; 

(iii)  It  must  happen  at  each  trial  from  the  (if-  ?i  +  l)th  to 
the  i/th. 

The  probabilities  of  these  three  independent  events  are 
Hence  v^^  =  ( 1  -  um-u-i  ){l-p)p^, 

or  Um  -  Um-1  =  (1  -  UM-n-i)  (1  -  J^)  P"*- 

Putting  l-^M  =  '^Af> 

so  that  Wj^  is  the  probability  that  the  event  has  not  happened 
n  times  up  to  and  including  the  if  th  trial, 

Wm  -  Wm-i  +  {p''-  P'^^^)  WM-n-i  =  0. 

When  if  =  1,  2,  ...,  n  — 1,  the  value  of  w^  is  unity,  and 
Wn—^—p^-  The  probability  that,  in  the  first  n  +  \  trials  the 
event  shall  happen  at  least  n  times  consecutively,  is  clearly 
/)"  +  (1  -  p)  JO"  ;   so  that  Wn^i  =  1  -  2j)«  +p'^-^\ 

To  determine  Wj^^  put 

f{y)  =  ^^Uj^y^-\ 
Then 

00 

+         S       y^{wM-WM-i  +  {p''-p''-^')WM-n-i} 
M=n+l 

=  1  —  p^y'^-^  —  {p^—  />""*"0  y^' 
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Hence  w^  is  the  coefficient  of  y^~^  in  the  expansion  of 

1  --  y  +  (  p^  —  p«+i)  2/«+i 
in  ascending  powers  of  y.    Since 

the  value  of  Wj^  is  expressed  rather  more  simply  as  the  coefficient 
of  y^  in  this  latter  expression.  It  remains  to  determine  this 
numerically. 

If  ajj  (i  =  1,  2,  . . . ,  n  -(- 1)  are  the  roots  of  the  equation 
which  are  easily  shewn  to  be  all  distinct, 

1 

When  1  +  y/(2/)  is  represented  as  the  sum  of  partial  fractions 
in  the  usual  way,  it  takes  the  form 

**+!  Xi  {Xi''  -  p^)  1 


2 


so  that 


^+1  Xiixi'-p^) 


If  JO  is  either  very  small  or  very  nearly  unity,  the  solution  of 
the  equation  for  x  requires  special  treatment.  It  will  be  assumed 
that  this  is  not  the  case,  and  that  n  is  not  too  small  a  number. 
It  may  then  be  shewn  that  the  equation  for  x  has  one  root  very 
nearly  equal  to  unity,  given  approximately  by 

X^  =  \  —p'"'  -\-  j9"+i. 

It  may  also  be  shewn  that  the  moduli  of  the  remaining  roots 

1 
do  not  differ  much  from  ^  (1  -  p)« ,  and  that  they  cannot  there- 
fore be  near  unity. 

Hence,  except  for  comparatively  small  values  of  M,  the 
quantities  x^^,  x^,  ...  are  all  extremely  small  compared  to  x^ ; 
and  a  good  approximation  is  given  by 
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In  particular,  if  ^  =  |  and  n  is  not  too  small, 

^^  =  2n+i- 271-1  I     ~  2^  j         ""^""^  ^^^''^^' 
except  for  comparatively  small  values  of  M.    For  instance,  if 
a  spun  coin  is  equally  likely  to  fall  head  or  tail,  the  probability, 
that  in  M  spins  it  will  at  some  stage  fall  head  at  least  n  times 
running,  is  nearly 

/  1     xM+i 

l-(l-2-)       • 

The  numerical  significance  of  such  a  formula  as  this  is  rather 
difficult  to  grasp.  As  an  illustration,  if  the  coin  is  spun  steadily 
at  the  rate  of  12  spins  a  minute,  the  probability  of  each  of  the 
following  events  is  almost  exactly  J,  viz. 

(i)    The  occurrence  of  a  sequence  of  10  or  more  heads,  in 
1  hour  58  minutes; 

(ii)  The  occurrence  of  a  sequence  of  20  or  more  heads,  in 
85  days; 

(iii)  The  occurrence  of  a  sequence  of  40  or  more  heads,  in 
241,724  years. 

In  particular,  the  probability  of  the  case  referred  to  at  the  end 
of  Chap.  I,  when  n  =  10,  i¥=  100,  is  '05  very  nearly. 

Examples. 

9.  I.  Three  persons  play  as  follows.  Two  play  a  single  game ; 
and  the  loser  sits  out,  while  the  third  person  comes  in  for  the 
second  game.  At  the  end  of  each  game,  the  loser  sits  out;  and 
the  one  who  was  not  playing  comes  in.  In  each  single  game  the 
two  engaged  have  equal  chances  of  winning.  The  play  goes  on 
till  one  player  has  won  n  consecutive  games.  What  is  the  chance 
that  the  play  will  be  over  by  or  before  the  end  of  the  Nth.  game  ? 

If  play  is  not  over  at  the  end  of  the  iVth  game,  denote  by  u^f^  r 
the  chance  that  at  the  end  of  the  iVth  game  its  winner  has  won 
r{r  =  l,  2,  ...,n—l)  consecutive  games.  He  has  an  even  chance 
of  winning  the  (N+  l)th  game;  and  therefore 

^*A^+l,  r+l  =  2^N,  r,       (r  =  1,  2,  . . .  ,  ?i  —  2). 

Now  Un+i^i  is  the  chance  that  the  winner  of  the  iVth  game 
loses  the  (iV4-l)th,  whatever  number  of  games  he  had  won 
consecutively.    Hence 

'^■^N+i,  1  =  2  i^^N,  1  +  U^,  2  4-  ...  +  Uy^,  n-i)' 
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Using  the  previous  relation,  this  is 

11  1  . 

and  since  '^n+i,  r+i  =  i  ^jv,  r  > 

each  of  the  quantities  uif^r  (^  =  1 ,  2,  ....  n  —  1)  satisfies  this  linear 
difference  equation.  Now  %,  the  chance  that  play  is  not  over 
at  the  end  of  the  iV^th  game,  is  given  by 


Hence 


11  In 


Moreover  %  =  1,  where  iV  =  1,  2,  ... ,  n  —  1.    Hence,  if  a-i  (i  =  1, 
2,  . . . ,  n  —  1)  are  the  roots  of  the  equation 


a;«-i--a;"-2-^^a?"- 


On-i  ' 


«-l 


%=  S  AiXi^, 


i  =  l 


»-l 


where 


1=  S  ^i^^A     (m  =  l,2,  ...,7i-l). 

i  =  l 

It  follows  that 

=  0. 


%> 

a?2      , 

1 , 

^1        , 

^2          , 

•  •  >       ^n— 1 

1 , 

d?!         , 

**2         ) 

. . ,       ^  n—i 

1.y,  n—\         «.  «— 1 


a; 


n-l 
',1-1 


The  developed  form  of  this  determinant  is  easily  found  to  be 

/(I) 


n-l 


where 


i  =  l 


{l-a;i)f'(xi)' 


f(x)  =  x-^-~x-^^-^^x- 


1        ^"-^""'  +  2^ 


2»- 


x-^ 


The  equation 


^n_^»-i^._^0 


is  practically  the  same  as  that  which  has  been  discussed  in  the 
previous  example.  It  has  one  root,  x^ ,  very  nearly  equal  to  1  —  ~ ; 
and  the  moduli  of  the  others  differ  only  slightly  from  ^.    Hence 
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when  N  is  large,  the  first  term  in  the  above  formula  is  very- 
much  larger  than  any  of  the  others,  i.e. 

%  =  71 w//    N  very  nearly. 

(n  -  1)  a^»  -  (f  n  -  2)  x,^-^  +  i(n  -  l)x,-'^  -  ^ 
Now    f\x,)^ j^^-^^ ; 

if  terms  containing  ^  are  neglected, 

/>.)=2(i-i; 

Hence,  when  n  is  not  too  small  and  N  is  large  compared  to  n, 

2^. 


2   \  2»— 4 

V 


%=(1-;^ 


II.  If  the  probability  of  a  coin  falling  head  is  p,  what  is 
the  probability  that,  at  some  stage  in  N  consecutive  spins,  the 
number  of  heads  exceeds  the  number  of  tails  by  rl 

(This  is  the  same  problem  as  that  of  a  person  playing  against 
another  with  unlimited  resources,  which  may  be  expressed  as 
follows : — If  a  person  receives  a  counter  each  time  he  wins  a  game 
and  pays  one  each  time  he  loses,  and  if  he  starts  with  r  counters, 
what  is  the  chance  that  before  N  games  are  over  he  will  have 
lost  all  his  counters,  assuming  that  the  chance  of  his  winning 
any  games  is  p  ?) 

Suppose  un^r  is  the  required  probability.  It  is  clear,  from 
considering  the  two  possibilities  with  respect  to  the  first  spin,  that 

Moreover,  from  the  meaning  of  the  symbols,  m^,o  is  unity;  and 
Uj^^if  is  p^ ,  for  all  values  of  N. 

Consider  now  a  sequence  of  r  +  25  spins,  in  which  there  are 
r*  +  5  heads  and  5  tails.  At  the  end  of  the  sequence,  the  heads 
exceed  the  tails  by  r.  Denote  by  /(r,  s)  the  number  of  such 
distinct  sequences,  for  which  the  heads  do  not  exceed  the  tails 
by  r  until  the  end  of  the  sequence.  The  probability  of  any  one 
of  these  sequences  is  ^*"'"*(1  —py.  Hence  the  probability,  that 
in  r  +  25  spins  the  heads  exceed  the  tails  by  r  at  the  end  and 
by  less  than  r  at  each  previous  stage,  is/(r,  s)p^'^^{\  —  p)*. 
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It  follows  at  once  that 


t=s 


u 


r+2* 


,,=p'-  2  f{r,t)f{\-p)K 


Entering  this  expression  for  Ur+2s,r  in  the  above  difference 
equation,  and  noting  that  the  result  is  true  for  all  values  of  p, 
it  is  found  that 

f{r,t)=f(r-l,t)+f(r  +  l,t-n 
Also,  from  the  meaning  of /(r,  s),  it  follows  that 
/(a,0)=l,    /(0,6)  =  0, 
for  all  values  of  a  and  b.    Direct  calculation  gives 
f{r,\)  =  r,    /(r,2)  =  ir(r  +  3),    /(r,  3)  =  ir  (r  +  4)(r  +  5). 
This  suggests  that 

f{r,  s)  =  -  r  (r  +  5  +  1)  (r  +  5  +  2)  . . .  (r  +  2s  -  1 ), 

which  is  verified  immediately  on  entering  this  value   in   the 
functional  equation.    Hence 


0 


t\ 


This  is  clearly  also  the  value  of  i^r+28-i-i,r- 

The  numerical  determination  of  Wr+2«,r  for  given  values  of  r 
and  s  from  this  formula  would  be  laborious.  It  may  however  be 
shewn  that,  for  all  values  of  x  from  0  to  J  inclusive,  the  series 
r(r  +  3)  r(r  +  ^  +  l)...(r  +  2^-l)   , 

is  convergent  and  has  the  sum 


/l-Vl  -^xV 
I 2x j' 


where  Vl— 4a'  denotes  the  positive  square  root.  When  p  =  ^, 
p{\—p)  —  \',  and  for  any  other  value  of  ^,p(l  —;?)  is  less  than  \. 
Also  when  x=^p{'i  -p),  Vl  -  ic  is  1  -  2;p  or  2;?  -  1,  according 
as  j9  is  less  or  greater  than  \.  The  series  in  Ur+2s,r,  when  taken 
to  infinity,  is  therefore  (1  —p)~^  orp-^,  according  as  p  is  less  than 
or  greater  than  i.    It  follows  that,  when  N  is  great  enough, 

=  1,  P>h 
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It  has  been  seen  that,  however  small  p  may  be,  the  probability 
of  a  run  of  r  consecutive  heads  approaches  unity,  as  N  is  taken 
greater  for  all  values  of  r. 

The  present  section  shews  that,  if  p<h  the  probability  of 
the  heads  exceeding  the  tails  by  r,  however  large  N  may  be, 
diminishes  towards  zero  as  r  increases. 

III.  What  is  the  probability  that,  in  a  sequence  of  M  spins, 
the  tails  shall  never  be  in  excess  of  the  heads,  assuming  head 
and  tail  equally  likely? 

This  question  has  several  interesting  applications.  Consider 
first  the  case,  in  which  a  sequence  of  2N  spins  results  in  N  heads 
and  N  tails.  Denote  by  y\r  {N)  the  number  of  these  sequences, 
in  which  the  heads  are  in  excess  of  the  tails  at  every  stage  except 
the  last ;  and  by  <f>  (iV)  the  number  of  the  sequences,  in  which 
the  heads  are  either  equal  to  or  in  excess  of  the  tails  at  each  stage. 
In  each  of  the  y^r  (N)  sequences,  the  first  two  spins  must  be  head 
and  the  last  must  be  tail.  Removing  the  first  and  the  last,  there 
remains  a  sequence  of  i\r—  1  heads  and  iV^—  1  tails,  in  which  the 
heads  are  equal  to  or  in  excess  of  the  tails  at  each  stage.  Con- 
versely, by  prefixing  a  head  and  annexing  a  tail  to  each  sequence 
of  iV—  1  heads  and  iV—  1  tails  in  which  the  heads  are  equal  to 
or  in  excess  of  the  tails  at  each  stage,  a  sequence  of  N  heads 
and  iV  tails  is  formed  such  that  the  heads  are  in  excess  of  the 
tails  at  each  stage  except  the  last.    Hence 

^(i\r)==0(iV'-l). 

Now  each  of  the  </)(iV)  sequences  must  either  be  a  yjr(N) 
sequence,  or  there  must  be  a  maximum  value  of  n,  such  that  it 
begins  with  a  yjr  (n)  sequence;  and  in  the  latter  case,  what  follows 
the  first  2n  spins  must  form  a  (j)(N—n)  sequence.    Hence 

<t>{N)=''X    ylr{n)cl>(N-nX 

with  tne  convention  (f>iO)  =  1.    It  is  also  clear  that  -v^r  (1)  =  1. 
Combining  these  two  equations,  it  follows  that 


00 

Now  put  /(x)  =lyjr{a)  a^. 

1 


FB 
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Then  [f{^)V=    5    ^}r  (a)  yjr  (b)  x^+^ 

n  =  oo  a  =  n— 1 

—  %       S    yjr  (a)  yjr  (n  —  a)  x^^ 

n  —  'i      a— I 
n—<xi 

=  S    '\lr{n)x^==f{x)  —  x. 

Since  yjr  (a)  is  essentially  positive,  this  gives 

2/'(a;)=l-Vl-4^, 

where  the  positive  square  root  must  be  taken,  and  the  infinite 
series  used  is  convergent  if  4a;  is  less  than  unity.  Then,  comparing 
coefficients, 

Suppose  now  that  there  are  just  F{M)  sequences  of  M  spins, 
in  which  the  tails  are  at  no  stage  in  excess  of  the  heads.  If  M 
is  odd,  say  2N  —  1,  we  can  pass  to  a  sequence  of  2N  spins,  in 
which  the  tails  are  at  no  stage  in  excess  of  the  heads,  by  annexing 
either  a  head  or  tail  to  the  sequence  of  2N  —  1 ;  and  in  this  way 
all  sequences  of  2N,  satisfying  the  condition,  may  be  formed. 

Hence  F{2N)  =  2F{2N-1). 

If  M  is  even,  say  2i\^,  a  tail  cannot  be  annexed  to  any  one  of 
the  (j)  (N)  sequences,  in  which  the  heads  are  equal  to  or  in  excess 
of  the  tails  at  every  stage  except  the  last.    Hence 

F{2N-^l)  =  2F{2N)-(t>{N). 
Combining  these  two  equations,  we  have 

F{2N) - ^F{2N-  2)  =  -2(f> (N-l). 
Since  F{2)  is  2,  this  gives 

r-l 

It  follows  that  in  a  sequence  of  2N  spins  the  probability,  that 
the  tails  are  never  in  excess  of  the  heads,  is 


iv-i  2r! 

1-   S 


7  2^Hr+])!r!_  ' 
This  may  also  be  put  in  the  form 


1  5  2r 


2iv2''(r  +  l)Ir: 
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and  it  will  be  shewn  in  the  next  chapter  (p.  42)  that,  if  N  is 
not  too  small,  it  is  very  nearly  equal  to  --r=^ . 

VTTiV 

A  more  general  question  of  the  type  of  those  just  considered 
is  the  determination  of  the  probability  that,  in  a  sequence  of 
spins,  the  heads  shall  never  exceed  the  tails  by  more  than  ?',  and 
the  tails  shall  never  exceed  the  heads  by  more  than  5.  The 
question  may  also  be  put  as  follows. 

IV.  A  and  B  play  a  game  at  which  A's  chance  of  losing  is  p. 
To  begin  with,  A  has  2r  counters  and  B  has  25,  where  r-\-s  =  n. 
Each  time  A  wins  a  game,  B  pays  him  a  counter ;  and  each  time 
he  loses  a  game,  he  pays  B  a  counter.  What  is  the  probability 
that  A  will  have  lost  all  his  counters  before  or  at  the  end  of  the 
2iVth  game? 

Let  Ur^N  denote  the  probability;  so  that,  for  all  values  of  N, 
^0,2^  =  1)  Ur^N  —  0.  The  chance  that  A  loses  two  games  con- 
secutively is  j9^,  the  chance  that  he  gains  one  and  loses  the  other 
is  2p  (1  —  p),  and  the  chance  that  he  gains  both  is  (1  —pY-  Hence 

■U2,N=P''U^,N-1  +  2p(l-  p)  Wo,  AT_i  +  (1  -py  Us,N-l, 


u 


r,N=p-Ur-i,N-i+^P(^  - P)  Ur,N-l  +  {'^  -pyUr+i,N-i, 


Un-i,N  =  P' Un-2, N-i  +  2p  ( 1  - p)  U,^_^^ ,v_i . 

This  is  a  system  of  linear  difference  equations  for 

If  the  particular  solution,  independent  of  N,  is 

then  fCr.^  +  [2p{l  -p)  -  1]  a  +  (1  -pfCr^,  =  0, 

with  the  conditions 

These  conditions  give 


3-2 
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If  now  Ur,N  =0r  +  ArCL^ , 

then 

A,OL  =  2jo  (1  -p)A,  +  (1  -i))M2, 

A^OL  -^p'A,     +2p{\-  p) ^2  +  (1  -pY^z. 

ArOl  =  jO^^r-1  +  ^P  (1  -p)Ar  +  (1  -pYAr+u 
An-iOl  =  P^An-2  +  2p  (1  -  p)  An-i] 

and  a  is  given  by 

2p{l-p)-a,        {l-pf     ,             0          ,0,0,  0 

p^          ,2p{l-p)-(x,        (l-pf      ,        0      ,0,  0 

0          ,            p2          ,   2^(l-p)-a,  (l-;t>)2,  0,  0 

0  ,  0  ,  0,0      ,p\2p{l-p)-a 

where  there  are  n  —  1  rows  and  columns. 

Let  cCi  and  a->2  denote  the  roots  of  the  equation 

p^x^  +  {2p  (l-p)-a}oo  +  {l  -pY  =  0. 
The  determinant  equation  is 


=0, 


1  ^^2 

on  putting  a  =  2/?  (1  -  p)  (1  +  cos  ^), 

the  determinant  equation  becomes 

sin  'f^O  _ 

so  that  the  n  —  \  values  of  a  are 

fir 

4^p{l-p)cos'^,         (t  =  l,2,...,n-l). 

The  difference  equation  for  the  coefficients  Ar  is 

(1  -pYAr+i  +  [2p  (1  -  p)  -  a)  Ar-\-p'Ar-^  =  0, 

^     *      5f  sm  —  +  Bt  cos  — 


giving        ^.=  (^i_^^    L   •         « 

Since  i*o,2v=l»  '(^n,N  =  0,  it  follows,  taking  account  of  the 
particular  solution,  that  AQ  =  An  =  0;  and  therefore  5/  =  0. 
Hence 

,.         _  ^2r  /I V "j 
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The  n  —  1  constants  Bt  will  be  determined  by  the  conditions 

The  probability,  that  B  will  lose  all  his  25  counters  by  or 
before  the  end  of  the  2iVth  game,  say  Vs,n,  will  be  found,. by 
putting  1—p  for  p  and  n.  —  r  for  r  in  Ur^N-  The  probability, 
that  neither  player  has  lost  the  whole  of  his  counters  at  the 
end  of  the  2iV^th  game,  isl  —  Ur,N—  Vs,n-  This  is  the  probability 
that  in  2N  spins  with  a  coin,  for  which  the  chance  of  falling 
head  is^,  the  excess  of  heads  above  tails  at  each  stage  is  between 
2r-2  and  -25  +  2. 

^"-^"^  Ti  f  •    ''^^TT       •    (n  —  r)t7r)        ^^ 

Mr.jy +  ^s,2/  =  1 -*-     ^     Btism hsin^ — V  cos^ 

t=i         [        ^  '^        J 

=  1^22'5,sin^cos*('*^), 
n  \2n/ 

where  X'  is  the  sum  for  odd  values  of  t  from  1  to  n  —  1. 

To  fix  the  ideas,  suppose  n  even  and  equal  to  2m.  Then 
if  '^r,8,N  is  the  probability  that,  at  each  stage  in  2iV  spins  of 
a  coin  for  which  head  and  tail  are  equally  likely,  the  excess  of 
heads  above  tails  lies  between  2r  —  2  and  —25  +  2  inclusive, 

^%"*„         .     r(2i-l)7r       ,^(2i-l)7r      „ 
and  ^^i,2m-i,i=i,      'M^r,2m-i,i  =  1      (^>1)- 

Hence 

«       1  .    rTT       „„  TT        .    Srir       „„37r  .    (2w-l)rir         „(2m-l)7r 

0=    KWraN,   Sin --cos2^^-— ,    Sin -— cos2^^ ;j— ,    ...,   sin^ — ^ — cosSJ*^^^ — - — '-- 

3  TT         «  TT  .    Stt        ,  St  ,    (2w-l)7r       „{2w-l)7r 

-  ,    sin  ,^cos2  —    ,   sin  —  cos2  —    ,    ... ,  sin  — ^ — '—  cos^  ^ — - — '— 

8  2m         ^m  2m  4/n  2m  4m 

1  .     2ir        ,  -TT           .    Gtt        2^^                  ^.^  2(2m-l)^_^(2m-l)7r 
.   sin  —  cos^  — -     ,   sin  -—  cos^  -—    ,    . . . ,   sin cos^ 

2  '  2m         4m    '          2m         4m  2m  4m 


1  .    mTT       ,  TT           .     3m7r       „  Stt                    m  2m-l)7r          (2m  -  1)  ir 
-           .    sin  ;^-  cos^  -—    ,   sin  -^;—  cos^  -- ,    .  . ,   sin  — ^-^r cos- j- 

2  2m  4m  2m  4w  2m  4m 


determines  i^r,«,2v-    When  N  is  sufficiently  great, 

„(2i  —  l)7r  ..     ^  ^  V 

cos^^ — -. — '—,         (1  =  2,  3,...,  w) 
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is  very  small  compared  to 

cos=^  ^ ; 
4m 

so  that,  for  large  values  of  N, 

Wr  s  N=  ^  COS^  -. —  Sm  ^r—  , 

where  A  is  very  nearly  constant. 

V.  A  box  contains  n  objects;  when  one  is  drawn,  each  is 
equally  likely  to  be  drawn.  An  operation  consists  of  drawing 
an  object  from  the  box  and  replacing  it  by  a  white  object. 
What  is  the  probability  that,  after  r*  operations,  there  are  just 
cc  white  objects  in  the  box,  assuming  that  there  were  a  white 
objects  initially  ? 

Denote  the  required  probability  by^a;^,..  If  a?  is  less  than  a, 
p^^  =  0  for  all  values  of  r. 

If  there  are  just  x  white  objects  after  r  +  1  operations,  there 
must  be  either  a?  or  ic  —  1  after  r  operations. 

If  there  are  oo  after  r  operations,  a  white  one  must  be  drawn 
in   order   that    there   may  be   os   after   ?'  4- 1   operations ;    the 

probability  of  this  is  - .    If  there  are  x—1  after  r  operations, 

no  one  of  them  must  be  drawn  in  order  that  there  may  be 

X   white   after   r  + 1    operations ;    the    probability   of   this   is 

n  —  x  +  1     TT 

.   Hence 

n 

_x  n  —  x-\-\ 

for  all  values  of  x  from  a  to  n. 
Taking  a  for  x,  this  gives 

Taking  a  4- 1  for  x,  it  gives 

a  +  1  __n  —  a         _  (n  —  a)  a^ 

Pa+i,  r+i  —  Pa+1,  r  -       ^       Pa.  r  -  '      ^^^qi]^        • 

The  general  solution  of  this  is 

n'^Pa+i,  r=0(a  +  iy-(n-a)a''; 
and  since  Pa+i,o  =  0, 


we  find  pa+^^  ^  =  (n  —  a) 


n 


T-OT 
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Taking  a  +  2  for  x, 

a  +  2 


Pa+2,  r-\-\ 


n 

n  —  a—  1 

n 


Pa+2,: 


Pa+i,r  =  (?i  -  a  -  1)  (n  -  a)  [(a  +  ly  -  a"], 


the  solution  of  which,  when  Pa+2,o  =  0.  is 


Pa+2,  r  — 


(n  —  a  —  l)(n  —  a) 


m-^^jHt)' 


There   is   no   difficulty   in    verifying,  by  an   induction,   the 
general  formula  suggested  by  these  particular  cases,  viz. 

(n  —  a)\ 


Px, 


©'-<^-'^)p:- 


{n  —  x)\  (x  —  a)\ 

{w  —  a)  {x  —  a  —  1)  (X  —  2\ 


X-  ly 


+ 


1.2 


n    / 


nj 


The   probability,   that   after  r  {>  n  —  a)   operations  all    the 
objects  in  the  box  are  white,  is 

(n  — a)(w  —  a  —  1) /.       2' 


i_(„_„)(i_iy+ 


1.2 


\        n' 


When  r  =  pn,  and  n  is  not  too  small,  this  is  sensibly 

(1  -  e-p)"-"^. 
For  instance,  if  n  =  100,  a  =  0,  the  probability,  that  the  objects 
will  all  be  white  after  1200  operations,  exceeds  j^o^oi)- 


CHAPTER  IV 
METHODS  OF  APPROXIMATION 

10.  It  will  have  been  seen  in  the  previous  chapters  that, 
while  the  formula  for  a  probability  connected  with  a  com- 
paratively small  number  of  trials  is  often  a  complicated  num.erical 
function,  the  approximate  expression  for  the  probability,  when 
the  number  of  trials  is  great,  takes  a  relatively  simple  form. 

In  the  present  chapter,  it  is  proposed  to  obtain  approximate 
expressions  for  the  probabilities  of  various  results,  on  the  under- 
standing that  the  number  of  trials  is  very  great. 

The  particular  case  chosen  for  investigation  is  that  of  a  series 
of  N  spins  of  a  coin,  where  iV  is  a  large  number ;  but  it  will  be 
clear  that  the  method  of  approximation  has  other  applications. 
The  case,  in  which  the  coin  is  equally  likely  to  fall  head  or  tail, 
is  first  dealt  with. 

11.  A.  In  a  sequence  of  N  spins  there  are  2-^  possible  results, 
all  of  which  are  equally  probable,  taking  account  of  the  order  in 
which  heads  and  tails  follow  each  other.  To  determine  the 
number  of  these  which  give  r  heads  and  N  —  r  tails,  is  the 
same  as  finding  the  number  of  distinct  ways  in  which  r  things 
may  be  chosen  from  N.  Hence  the  probability,  that  the  series 
of  spins  results  in  r  heads  and  N  —  r  tails,  is 

N\         1 

So  long  as  N  and  r  are  small  numbers,  there  is  no  difficulty 
in  evaluating  this  formula  numerically ;  but  it  is  obvious  that 
direct  numerical  calculation  is  out  of  the  question,  when  numbers 
running  into  thousands  have  to  be  dealt  with.  For  instance, 
the  labour  of  determining  directly  the  probability,  that  in 
10,000  spins  the  number  of  heads  lies  between  4900  and  5100, 
would  be  prohibitive. 

To  deal  with  any  such  calculations  some  method  of  approxi- 
mation is  absolutely  necessary ;  that  is  to  say,  some  approximate 
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formula  for  n!,  when  w  is  a  large  number.    What  is  known  as 
Stirling's  theorem  serves  this  purpose.    It  states  that 

where  On  approaches  the  limit  zero  as  n  increases. 

The  6n  of  the  formula  is  already  small,  when  n  is  com- 
paratively small.  If  n  >  10,  0  <  ^n  <  *01 ;  so  that,  for  values  of  n 
exceeding  1000,  the  proportional  error  involved  in  omitting  the 

factor  1  H-irt^  is  less  than  -00009. 

This  formula  will  first  be  used  to  obtain  a  convenient  approxi- 
mation to  the  probability  of  a  given  excess  of  heads  in  a  sequence 
of  spins.  To  make  the  calculation  as  symmetrical  as  possible ,  suppose 
that  the  number  of  spins  is  2N  and  the  number  of  heads  iV  +  r,  so 
that  2r  is  the  excess  of  heads  over  tails.    The  probability  of  this  is 

2NI 1_ 

If,  in  the  above  formula  ««  be  written  for    .,  _^  /^ ,  then 


2Nl 


(iV^+r)!(i\r_r)!2^ 


N 


]^2N 


12n 


1  +  (H2N 


^JN'  -  r^)  (i\r  +  rf^  (N  -  ry-  (1  +  a^+,)  (1  +  a^-r) 
1 (^  _  7^\^ 1  +  a._N 


(i+-;f^' 


Let 


N-r  (1       JSf2) 


(l+a7v+r)(l+a^-r)' 


y,  \N+r 


1  - 


N 


N-r 


then,  since  r/N  is  a  proper  fraction, 
\ogD  =  (N+r) 


r 

N 


1   r^      1  ^ 

2F2'^3A'3 


-{B-  r) 


r      ]    r-       1  r^ 

F  "^  2  F^  "^  3  F 


+ 


n%2  1       ,^v4  "1         ^6 


and 


D 


—  e 


42  APPROXIMATIONS  WHEN  [CHAP.  IV 

Hence 

where 

When  iV  is  large,  it  has  been  seen  that  the  last  factor  is  very 
nearly  unity.  In  the  other  factors,  write  x  for  r^iV.  They 
become 

When  flj/iV  is  small,  this  is  very  nearly  unity ;  but  it  diminishes 
as  x/N  increases.    Hence 

1    rN 


is  a  good  approximation  to  the  required  probability,  so  long 
as  r^/I^  is  not  too  large ;  but  it  gives  too  great  a  value  as  r-/N 
increases.  It  is  of  course  to  be  noticed  that,  unless  r^jN  is  small 
enough,  the  numerical  value  of  the  probability  is  inappreciable- 

B.  A  precisely  similar  result  can  be  obtained  when  the  number 
of  spins  is  odd;  but  in  dealing  with  large  values  of  N,  there 
would  be  no  real  loss  of  generality  in  taking  the  number  of 
spins  always  even. 

That  this  approximation  lends  itself  readily  to  calculation 
will  be  clear,  by  considering  the  question  suggested  above.  In 
2N  spins  the  chance,  that  the  number  of  heads  lies  between 
N-{-p  and  N-p,is  approximately 

r=-p  vttN 

Putting  r  =  x  V^     r  +  1  =  (^-  +  Sx)  \/N, 

this  is  sensibly  the  same  as 

p 

4=  e-^'dx. 

NTT  J ZL 
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If,  as  in  the  question  referred  to  above,  2N  —  10,000,  p  =  100, 
pI^N  is  \/2;  and  tables  give  the  value  of  the  integral  to  be  '95. 
As  a  further  illustration,  since  (from  tables) 

VtT  •    -1-8 

the  probability,  that   the  difference  between  the  number   of 

heads  and  tails  in  N  spins  will  exceed  2  54  \/N,  is  less  than  1/100. 

12.    C.    If  the  probability  of  an  event  happening  at  a  single 

trial  is  p,  the  probability  q  that  it  will  happen  r  times  in  N  trials  is 

^      r\  {N-r)r    ^       ^^ 
(i).    Put  r  =  pN-\-x, 

so  that  N  —  r  =  {l—p)  N  —  x\ 

and  suppose  N  so  great,  that  the  factor  1  +  a^  in  the  approximate 
expression  for  N  may  be  safely  replaced  by  unity.   Then 
log  9  =  -  J  log  27r  +  (iV  4-  i)  log  iV  -  (  pN  +  ^  4-  J)  log  (  piV  +  x) 
-[i\-p)N-x-^\]\og[{\-p)N^x} 
+  {pN^-x)\ogp  +  {(1  -p) N- x}  log (1  -p) 
=  -\\og2'rrp{\-p)N 


-^\og27rp{l-p)N 


1    A^•2  4-^      x'-x'x        1     \a?^%x'     a? -\o!?' 


-I-  terms  m        , 
and  therefore 
1 


2^V    p  1-p/      6iVM     p'  (1-p) 

1 


^2'Trp{l-p)N 


X  e 


22V^      i)!!-;?)       "^62^2  p^{l-p)^ 
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When  N  is  large  enough,  the  factor 

2 

cannot  differ  sensibly  from  unity,  until  x  is  of  the  order  N^ .   But 
when  X  is  of  this  order,  the  factor 

__1_  x^-{-(l-2p)x 
^      2N      p{l-p) 

is  excessively  small.   Again,  the  factor 

_  Jl  (1  -  2p)  X 

^     2N  p{l-p) 

will  not  differ  sensibly  from  unity,  until  x  is  of  the  order  iV,  and 
then,  again, 

1         a;2 


g      2i^i>(i-p) 


is  excessively  small. 


Hence  q==—==L=e    ^^P^^'P) 

V  27rp  {l-p)N 

is  a  good  approximation  to  the  value  of  q,  so  long  as  this  value 
is  appreciable.  If,  however,  it  were  necessary  to  determine  the 
numerical  value  of  q  for  values  of  x,  for  which  q  is  excessively 
small,  this  approximation  might  not  hold. 

(ii).  In  the  case,  in  which  either  p  or  1  —p  is  very  small,  another 
approximation  to  q  maybe  obtained,  which  is  for  some  calculations 
more  convenient  than  the  preceding.  Suppose  that  p  is  very 
small ;  and  put  pN  =  v.    Then 

^      r\'    (iV-r)!iV^    ' 
and,  replacing  the  factorials  by  their  approximate  values, 


^      r\\N-rJ    '  (N-iy-^e-'^+'N'' 


r_ 
Now  p  is  very  small  compared  to  N.   If  r  differs  very  much 
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from  V,  the  value  of  q  is  excessively  small.    On  the  other  hand, 
if  r  does  not  differ  too  much  from  v, 

(l-;^)    =e"^     i}--^    =  e-*- very  nearly, 


and 


e  "V 


■('-a' ('-a 


6~   V 

so  that  q  =  — —  very  nearly. 

Probable  Value :  Most  Probable  Value. 

13.  It  is  convenient  here  to  introduce  two  conceptions  which 
prove  to  be  of  great  value  in  many  applications  of  the  theory  of 
probabilities;  viz.  those  of  the  "probable  value,"  and  the  "most 
probable  value." 

If  a  number  can  take  any  one  of  the  distinct  values 

tti,    (t  =  l,  2,  ...,n), 

and  if  the  probability  that  the  number  takes  the  value  ai  is 
p»,  (i  =  1,  2, . . . ,  n),  so  that  2pi  =  1,  then 

tpiai 

i 

is  called  the  probable  value  of  the  number ;  and  if  p^  is  the 
greatest  of  the  ^'s,  a^  is  called  the  most  probable  value. 

It  is  to  be  noticed  that  the  probable  value  is  not  necessarily 
one  of  the  values  that  the  number  actually  takes;  it  is  the 
mean  of  the  values  when  the  weight,  given  to  each  in  taking 
the  mean,  is  proportional  to  its  probability. 

N\ 

14.  D.    Since  -^ r-^ — :  is  as  great  as  possible  when  r  —  ^N 

or  \{N  -{■  1),  according  as  N  is  even  or  odd,  the  most  probable 
value,  of  the  difference  in  number  of  heads  and  tails  in  a  sequence 
of  N  spins,  is  0  or  1  according  as  N  is  even  or  odd. 

The  probable  value  of  the  excess  of  heads  over  tails  (or  tails 
over  heads)  in  N  spins  is 

rio(N-ry.r\     2^     ' 
and  this  is  obviously  zero,  as  would  be  expected. 
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In  2iV"  spins,  the  probable  value  of  the  square  of  the  excess 
of  heads  over  tails  is 

I  2N\  4r^ 

This  may  be  evaluated  as  follows.  We  have 

^:  2N\  ^^(l+g^r 

so  that,  putting  oo=l, 


r 


r 


=  _^(iV^  +  r)!(iY-r)!2^ 


It  may  be  shewn,  in  a  similar  way,  that  the  probable  value  of 
the  fourth  power  of  the  excess  of  heads  over  tails  is  12iV2—  4iV; 
while  the  probable  value  of  any  odd  power  of  the  excess  is  zero. 

It  is  interesting  to  note  that,  if  the  approximate  value 


N 


V-TTiY 

of  the  probability  is  used  instead  of  the  true  value,  the  probable 
value  of  the  square  of  the  excess  is 

^  1         ~N 

r=-NV7rN 

which  is  sensibly  equal  to 

JL  [°°   4>Na)''e-'='dx,  that  is,  2iY. 

A  similar  calculation  of  the  probable  value  of  the  fourth  power 
of  the  excess  gives 
1 


\QN^a^e-^^dx,  that  is,  \2N\ 

This  is  about  ( 1  +  ^-^)  of  the  true  value.  Since  the  approximate 

expression  for  the  probability  has  been  seen  to  be  too  great  for 
considerable  values  of  r,  an  over-estimate  of  the  probable  value 
was  to  be  expected,  when  making  use  of  the  approximation. 
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E.  Any  sequence  of  N  spins  will  fall  into  a  series  of  sequences 
of  heads  and  tails.  The  first  spin  necessarily  starts  a  sequence 
of  heads  or  tails.  Suppose  that,  from  the  remaining  iV—  1  spins 
M—  1  are  chosen  in  any  way;  and  that  these  M—1,  and  these 
only,  start  the  sequences  of  heads  and  tails  other  than  the  first. 
This  implies  that  the  sequence  of  iV  spins  falls  into  M  sequences 
of  heads  and  tails.  Corresponding  to  each  way  in  which  the 
M—1  are  chosen,  there  will  be  two  distinct  sets  of  sequences 
of  heads  and  tails;  for  the  first  sequence  may  be  either  heads 
or  tails.  Now  the  number  of  ways,  in  which  M—l  things  may 
be  chosen  from  N  —1,  is 

(M-l)l(N-M)l' 

.  There  are  therefore  just  twice  this  number  of  ways,  in  which 
the  N  spins  may  fall  into  M  sequences  of  heads  and  tails.  It 
follows  that  the  probability,  that  a  sequence  of  N  spins  falls  into 
M  sequences  of  heads  and  tails,  is 

(N-l)l  1 

(M-l)l{N-'M)\2^-'' 

The  most  probable  number  of  sequences  of  heads  and  tails  is 
that  for  which  this  number  is  as  great  as  possible;  i.e.  for  which 
M  -  1  and  N  —  M  are  equal  or  differ  by  unity.  Hence,  if  N  is 
odd,  the  most  probable  number  of  sequences  of  heads  and  tails 
is  ^(iV+  1);  while,  if  N  is  even,  it  is  either  ^N  or  \N  +1. 

The  probable  number  of  sequences  of  heads  and  tails  is 
M^N  (N-l)\  M    . 

M=i  (if-l)!(iV-iV/)!2^-i' 

and,  by  the  method  already  used,  this  is  found  to  be  |(iV+  1). 
For  M  sequences,  the  average  number  of  heads  or  tails  in  a 
sequence  is  N/M;  hence,  in  a  long  series  of  N  spins,  the 
probable  value  and  also  the  most  probable  value  of  the  average 
number  of  heads  or  tails  in  a  sequence  is  2. 

Suppose  now  that,  in  a  set  of  M  sequences,  there  are  mi 
sequences  of  i{i  =  l,  2,  3,  ...).  These  numbers  are  obviously 
connected  by  the  two  equations 

Xmi  =  3I,     Hmi  =  ]Sf. 
Twice  the  number  of  solutions  of  these  equations  in  positive 
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integers  gives  the  number  of  ways,  in  which  a  set  of  N  spins 
may  fall  into  M  sequences  of  heads  and  tails.  The  number  of 
solutions  of  these  equations  is  the  same  as  the  coefficient  of  x^  in 

(x  +  x^  +  x^-^...)^; 
and  there  is  no  difficulty  in  verifying  in  this  way  the  result 
already  obtained. 

Suppose,  next,  that  each  sequence  of  heads  or  tails  is  limited 
to  contain  not  more  than  r  members.  Then  the  number  of 
solutions  of  the  two  equations  between  the  m's,  with  this  limita- 
tion, is  the  coefficient  of  x^  in 

(x  +  x^-^  ...  +«0^. 
Hence  twice  the  coefficient  of  x^  in 

i{x-^x^  +  ...+x^y^, 

M 

l-x 
that  is,  in  1  _  2^.  +  a;^+i ' 

is  the  number  of  ways  in  which  a  set  of  N  spins  may  fall  in 
sequences  of  heads  and  tails  not  exceeding  r  in  a  sequence.  This 
gives  a  new,  but  in  general  much  less  effective,  solution  of  the 
question  of  p.  47. 

F.  The  number  of  ways,  in  which  a  set  of  M  sequences  of 
heads  and  tails,  in  which  there  are  m^  sequences  of  i  (*  =  1, 2, 3, . . . ), 
can  occur,  is  the  same  as  the  number  of  permutations  of  M 
things  which  are  alike  in  sets  of  int,  {i=l,2,S,  ...);  and  this  is 

(mj  +  7^2  +...)! 

mj  I  mg  1 . . . 

Hence  the  most  probable  set  of  M  sequences  is  that  for  which 

(mi +  771.2 +...)!    1 
m,\m^l...      2^^i 
is  as  great  as  possible.    This  involves  determining  the  least 
value  of  TWilTWg!  ...,  subject  to  the  conditions 

Xmi  =  M,     ^inii  =  N. 
When  iV  is  large   enough,  a  roughly  approximate  solution  ot 
this  problem  may  be  found  as  follows.    Thus 

log  (wi !  W2 !...)  =  2  (i log  27r  +  (m^  +  i)  log  rrii  -  m^, 
which  differs  by  a  constant  from 
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If  the  m's  are  treated  as  continuously  varying  quantities,  the 
minimum  value  of  the  last  magnitude,  subject  to  the  above  two 
conditions,  is  given  by 

logmi  +  l+^_  +  ^+5i  =  0,     (1=1,2,3,...), 

where  A  and  B  are  constants.   Omitting  the  terms  -^ —  in  com- 

parison  with  log  mi,  these  equations  may  be  written 
mi  =  e-'-^-^\         (i  =  l,  2,  ...). 

Hence  M  =  e"^-^  Xe'^^  =  e''-^ , ^ , 


(1-e-^)'^ 


J9\2  ' 


M  M- 

so  that  e~^  =  1  —  -^ ,    e-^-^  = 


wit  = 


N'  N-M' 

[~ir~)  ~w[  N  I 


N-M' 


For  the  most  probable  number  of  sequences  M  —  \N:  and  the 
most  probable  set  in  this  case  is  that  for  which  mi  =  2~*"*i\^. 

It  is  interesting  to  compare  the  results,  that  have  been  obtain- 
ed on  the  supposition  that  the  probability  of  a  coin  falling  head 
is  definitely  known,  with  those  deduced  from  the  complete  data 
regarding  a  selection  of  the  coin.  Suppose  there  are  n  coins,  and 
that  the  chance  of  the  ith  coin  falling  head  is  pi  (i  =  1,  2,  . . . ,  n). 
Suppose  also  that,  when  one  of  the  coins  is  chosen,  the  chance  of 
choosing  the  jth.  is  q'j  ( j  =  1,  2, . . . ,  n). 

If  a  sequence  of  N  spins  is  made  with  a  chosen  coin,  the 
probability  of  the  various  results  will  clearly  depend  on  the 
number  of  spins  made  with  a  chosen  coin  before  a  fresh  one  is 
chosen.  For  instance,  if  a  fresh  coin  is  chosen  for  each  spin,  the 
probability  of  a  head  at  each  spin  is  Xqipi;  and  the  problem  is 
the  same  as  those  already  considered. 

Suppose  that  the  chosen  coin  is  used  throughout  the  iV  spins : 
what  is  the  probability  of  r  heads  and  iV  — r  tails?  If  the  iih 
coin  is  chosen,  the  probability  is 


FB 
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Hence  the  required  probability  is 

S  qipfil-pif-'. 


Consider  the  case  in  which 

so  that  the  chance  of  the  coin  falling  head  is  equally  likely  to 
have  any  one  of  the  values 

1  2  n 

n  +  l'    n-\-l'   '"'    n+1' 
The  probability  of  r  heads  and  N  —  r  tails  is  then 

N\         »  !»•  (?i  +  1  -  i)^-^ 


Jo 


r\{N-r)U=i      n{n+l)^      ' 

If  n  is  not  too  small,  the  sum  in  this  expression  differs  very 

little  from 

•1 

0 

rUN  —  r)^ 
the  value  of  which  is  -ttt — ^ — - . 

Hence,  if  n  is  large  enough,  the  required  probability  is  very 

nearly  independent  of  r  and  equal  to  j^ — ~  ;  a  marked  contrast 

with  the  results  already  obtained. 

15.  G.  So  far  in  dealing  with  a  repeated  trial,  it  is  only  the 
probabilities  connected  with  the  satisfying  or  not  satisfying 
of  a  single  condition,  that  have  been  considered. 

Suppose  now  that  A  and  B  are  two  different  conditions 
relevant  to  the  results  of  the  trial.  When  the  trial  is  repeated 
N  times,  suppose  that,  on  iVi  specified  occasions,  A  and  B  are 
both  satisfied ;  on  N2  occasions,  A  is  satisfied  and  B  is  not;  on 
iVj  occasions,  A  is  not  satisfied  and  B  is;  and  on 

occasions,  neither  condition   is   satisfied.    The  probability  for 
this  combination  is 

/pNi     '1)^2       7)-^3      7)-^* 

^AB  ^AB'  ^A'B  ^A'B' ' 

Now  the  iVi,  N^,  N^  and  N^  specified  occasions  can  be  chosen 
from  the  N  in 

N\ 
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ways.    Hence  the  probability  that  in  the  N  trials,  A  and  B  are 

both  satisfied  Ni  times,  and  so  on,  is 

AT' 
'  p^^  v^^   V^'   V^* 


If  condition  A  is  to  be  satisfied  in  just  r  of  the  trials,  and 
condition  B  in  just  s  of  the  trials,  then 

Hence  the  probability  g,  that  A  is  satisfied  in  just  r  and  B  in 
just  s  of  the  N  trials,  is 

^5 N\ 

^AB^AB'    ^A'B    ^A'B'  ' 

where  the  sum  is  taken  for  those  values  of  N^  which  make  no 
one  of  the  numbers  N^,  r  —  Ni,  s  —  Ni,  N  —  r  —  s  -\-  N^  negative. 
If  ?•  ^  s,  the  greatest  value  of  N^is  s;  and  the  least  value  will  be 
0  or  r  +  s  —  N,  according  as  r  +  5  is  less  or  greater  than  N. 

Putting:  P^P^'^'  =  \^  the  formula  becomes 
Pab'Pa'b 

^-PAB'K'.PA^'-'lf^^^^> 

where      f(N,)  =  ]^^i^r_;^^y^s-N,)\(N-r-s  +  N,y. ' 

When  N  is  large,  an  approximate  expression  for  this  probability 
may  be  obtained,  similar  to  that  of  p.  44  for  a  single  condition. 
Thus,  if 

^3  =Pa'B^-^^S=P3^+^3,      ^4  =PA'B'^-^^*=Pi^+  ^4, 

where  Xi-\- 002  +  ^3  + ^4  —  0,  then 
log  q  —  log  N ! 

-\og{p^N  +  x;)\-\og{p2N  +  X2)\-\og{p,N  +  x,)\ 

-\og{p,N  +  x,)\ 
+  (piiV^+  a^i)  log_pi  +  {poN  +  X2)  logpa 

+  {p.,N+x^)  \ogp^  +  {p,N^-  X,)  logp4. 
Writing  log  w !  =  J  log  27r  +  (n  +  \)  log  n  —  n,  we  have 
log  5  =  -  f  log  27r  -  I  log  i\r  -  I  log  pip^PsPi 

-|;(p.i.+..+j)(^-^+...), 

4-2 
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SO  that,  when  N  is  large  enough, 

q  ^        3 ^       c     ^^^    ^^  ^2  ^3  P4     / 

very  nearly. 

The  probability  that,  in  theiV  trials,  the  condition^  is  satisfied 
{Pi-^P2)N-^y  times  and  the  condition  B  is  satisfied  (^1 +^3) iV^+ 2^ 
times,  is  therefore  ^q  for  the  values  satisfying 

a?i  +  iCa  +  ^3  +  ^4  =  0,   a^i  +  572  =  y,   ^1  +  ^3  =  2. 
With  these  values,  it  will  be  found  that 


Pi 
,1111 

.Pi      P2      Pi       Pa 


Xi    -T  OOy         X^    ~l    X2         X^    +  ^3         X^    ~r  X^ 

1 1 r 


P2 


Pz 


P* 


(: 


\pi     pj^     \Pi     pJ^     ^\Pi     P2     Pz     pJ 
x-  iocj,  — - — ~ H — H ^ — h ^ —  r 


1111 

Pi         P2         Pi  P4 


\Pi       pJ\P2       pJ  "^  \pi      pJ\Pz       pJ 


1111 

Pi      P2      P%       Pa 

where         2yo  =  Pi+p2-p3-PA, 
2zo=pi-{-p3-p2-pA, 

-  40  =  —  +  —  +  —  +  — 2 
pi"     P2      Pz      Pa 


-'[h^^y-^^''^'"-'"-'"'^ 


-2 


PiPz      P2PA 


{i-Hpi+pt-P2-PAy} 


P\Pa 


+  2-^{l  +  2(i>.-p3)1. 

PzPs 
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With  the  abbreviations 

\Pl        PzJ  \P2  PJ  \Pl         PJ   KPS         PJ 

^11  .1111 

P1P4.  P2P3  Pl  Pa         Pz         P4 

the  required  probability  ^q  becomes 

'^PiPiPzP^  {^irNy 

_  L  [six,  -  xo)2  +  ^(y  +  yo)'  +  2/3(y  +  yo)(g  +  Zo)+7(g+Zo)2  +  C-1 

The  sum  is  taken  with  respect  to  values  of  x^  increasing  by 
unity  at  a  step.  Now  the  N^,  N2,  N3,  N^  of  the  original  notation 
are  piN'+oo^  +  y-^z,  p^^N-x^-z,  p^N -x^-y,  and  p^N  +  x^\ 
and  no  one  of  these  must  be  negative.  Hence  x^  ranges  from  the 
larger  of  the  integers  —p^N  and  —p^N  —  y  —  z  to  the  smaller 
of  the  integers  -p^N  —  z  and  p^N  —  y.  In  other  words,  the  lower 
and  the  upper  limits  of  0^4  are  of  the  orders  —N  and  N.  The  same 
is  true  of  the  lower  and  the  upper  limits  of  3:4  —  Xq,  until  either 
y  or  z  is  of  the  order  N,  in  which  case  q  is  excessively  small. 

Now,  if  N  is  not  too  small,  2e    2isr    "*      ^   ^  under  these  con- 
ditions, is  sensibly 

V  ~T"- 
c 

Further,  e    ^^^  differs  very  little  from  unity  under  the  conditions 

assumed.    Hence,  3/0  arid  Zq  being  proper  fractions,  the  required 

approximation  may  be  written 

q  =  = e  25N         very  nearly. 

^ PiPiPzPa^  '^'ttN 

This  quantity  varies  slowly  with  y  and  z.  Now  y  is  the 
excess  of  the  number  of  times,  that  condition  A  is  satisfied,  over 
(pi  +p-^  N,  which  is  the  probable  number  of  times  it  is  satisfied; 
and  a  similar  statement  may  be  made  for  z.  The  probability 
that  the  excess  for  condition  A  lies  between  y  and  y  +  ^y,  and 
for  condition  B  between  z  and  z  +  hz,  is  q  By  Sz. 

From  this  expression  the  probability, that  these  excesses  should 
lie  between  given  limits,  can  be  determined  approximately  by 
integration. 
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For  instance,  the  probability,  that  the  excess  for  condition  A 
lies  between  y  and  y  +  hy,  is 

ai/2  +  2/3?/ 2  +  72^ 


ly. ^= e  '^5^         dz. 


Now         ay""  +  Wyz  +  72;^  =  "'^^^ — —  2/'  +  7  f  ^  +  -  2/)  , 

and  £r^(^^'^Trf.  =  y^; 

hence  the  required  probability  is 


a7-/S2 


'^PiP2P3P4y  V27riV 
or,  inserting  the  values  of  a,  y8,  7,  it  is 


e     275^^  -  g^ 


1 


^27r(p,+p,){p,  +  p,)N 


Q   ^{Pi+P2){Pz+P4)^ Sy. 


CHAPTER  V 

PROBABILITY  OF  CAUSES 

16.  When  an  event  has  happened  which  may  have  been  due 
to  any  one  of  a  number  of  different  causes,  the  question  arises  as 
to  which  cause  has  most  probably  been  in  action.  Is  it  possible, 
from  the  observed  happening  of  the  event,  to  draw  any  con- 
clusion as  to  the  relative  probability  of  the  various  causes  that 
may  have  led  to  it  ? 

From  the  discussion  in  Chapter  I,  it  has  been  seen  that 
Paib/Pb  is  the  probability,  that  condition  Ai  is  satisfied  when 
condition  B  is  known  to  be  satisfied. 

Suppose  that  A^,  A2,  ...,  An  are  n  conditions  of  which  one 
must  be  satisfied,  and  only  one  can  be  satisfied,  when  a  trial  is 
made.   Then 

PB  =  ^PAiB> 

i 

PAiB 

Pax 

,1       ,  PAxB  PAi  PAiP(Ai)B 

so  that 


Pb  V  P^iB        ^PAiP(Ai)B' 

»  PAx 

Suppose  now  that  the  event  E  may  have  any  one  of  n  distinct 
causes,  of  which  in  a  given  trial  one  and  only  one  can  come 
into  play.  Let  condition  B  be  that  the  event  E  shall  happen,  and 
condition  Ai  be  that  the  zth  cause  has  come  into  play.  Then  j?^. 
is  the  probability  before  the  trial,  that  the  ith  cause  of  E  will 
come  into  play :  p  (^^.)  3  is  the  probability  that  E  will  happen  as 
a  result  of  the  tth  case ;  and  PaxbIPb  is  the  probability,  when 
E  has  happened,  that  it  has  happened  as  a  result  of  the  I'th 
cause.    The  formula  may  be  conveniently  written 


where  Vi  is  the  probability  of  the  zth  cause,  before  the  result  is 
known  (the  so-called  a  iwiori  probability  of  the  ith  cause) ;  Si  is 
the  probability  of  the  event  when  the  ith  cause  is  in  action; 
and  qi  is  the  probability  of  the  I'th  cause,  when  the  event  is 
known  to  have  happened  (the  so-called  a  posteriori  probability). 
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This  formula  is  known  as  Bayes'  formula*;  and  so  long  as 
the  r's  and  the  5's  are  known,  there  can  be  no  ambiguity  in 
applying  it.  As  a  simple  illustration,  the  following  case  may  be 
considered. 

I.  There  are  n  boxes,  each  containing  white  and  black  objects. 
The  chance  of  drawing  a  white  object  from  the  ith  box  is  pi. 
In  choosing  a  box  from  which  to  draw  an  object,  each  box  is 
equally  likely  to  be  chosen.  An  object,  observed  to  be  white, 
is  drawn  from  a  box;  it  is  then  returned.  A  second  object  is 
drawn  from  the  same  box.  What  is  the  probability  that  it  is 
white  ? 

In  this  case,  r^  =  - ;  so  that  the  probability,  that  the  white 
object  first  drawn  came  from  the  ith.  box,  is  pi/Xpi.   This  is  the 

i 

probability  that  the  ith.  box  is  used  at  the  second  drawing ; 
and  therefore  the  probability,  that  the  second  drawing  gives  a 
white  object,  is 

i 

tpi  ' 
i 

It  should  be  noted  that  this  is  greater  than  -  2pi,  which  is 

n    i 

the  probability  that  a  first  drawing  gives  a  white  object. 

If  from  the  above  statement  of  conditions  the  sentence, "  In 
choosing  a  box  from  which  to  draw  an  object,  each  box  is  equally 
likely  to  be  chosen  "  is  omitted,  there  are  no  data  from  which 
to  calculate  Vi)  and  the  question  proposed  cannot  be  answered. 

Use  of  the  Bayes  formula. 
17.  The  hesitation  that  is  undoubtedly  felt  in  making  use  of 
Bayes'  formula  depends  upon  the  fact  that,  though  the  5's  are 
generally  known,  some  assumption  has  to  be  made  with  respect 
to  the  r's;  and  the  calculated  probabilities  of  cause  depend  on 
the  particular  assumption  made.  This  will  be  brought  out  as 
clearly  as  possible  in  some  of  the  following  illustrations. 

II.  A  box  contains  n  objects,  each  of  which  is  either  white  or 
black,  and  each  of  which  is  equally  likely  to  be  drawn.   An  object 

*  It  is  due  to  the  Rev.  Thomas  Bayes  (elected  F.R.S.  1742) :  for  an  abstract  of 
his  two  memoirs,  see  Todhunter,  History  of  the  Theory  of  Probability,  eh.  xiv. 
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is  drawn  and  is  found  to  be  white.  It  is  returned;  and  an  object 
is  drawn  again.    What  is  the  probability  that  it  will  be  white  ? 
Denote  by  pr  the  d  priori  probability  that  r  of  the  objects 
are  white.   Then 

rpr 
%rpr 
is  the  a  posteriori  probability  that  r  are  white ;  and  the  proba- 
bility of  drawing  a  white  object  at  the  second  trial  is 

^Prr^ 
n^Prr  ' 
If  it  is  assumed  that  Pf.  is  independent  of  r  and  therefore 

equal  to  - ,  this  last  probability  is 

2     J^ 

If  however  each  object  in  the  box  is  assumed  to  be  equally 

n-  1 

likely  black  or  white,  then  pr  —  -r? — '■ — r.  •  ^  ;  and  the  required 

probability  is 

1      J^ 

2"*"2S* 
With  regard  to  such  a  question,  it  may  be  suggested  that 
the  data  are  very  meagre ;  therefore  it  is  not  surprising  that 
different   assumptions  about  the  d  priori  probability  lead  to 
very  different  results. 

III.  A  box  contains  a  number  N  of  objects  not  greater  than 
M;  and  it  is  known  that  n  of  them  are  marked.  It  is  assumed 
that,  when  a  set  of  m  objects  is  drawn  from  the  box,  all  sets  of 
m  are  equally  likely.  A  set  of  m  is  drawn;  and  it  is  found  that 
TWi  of  them  are  marked.    What  is  the  most  probable  value  of  iV  ? 

It  follows,  from  the  data,  that  N  is  equal  to  or  greater  than 
n  +  m  — TTii.  The  probability  of  the  observed  event,  when  the 
box  contains  iV  objects,  is 

nl {N-n)l 

mi !  (n  —  mi) !  (m  —  m^) !  (N  —  n  —  m  +  rrii) ! 


that  is. 


7n\{N  -  m) ! 
m\n\  {N-n)\{N-m)\ 


mi ! {n  —  itii) !  (m  —  mi) \'  N\(N  —  n- m-\-  rtix) ! * 
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Hence,  if  p^^  is  the  a  priori  probability  that  the  box  contains 
N  objects,  then,  after  the  event,  the  probability  is 

n+m~mi 

,                            .,,,,         (N-m)\(N'-n)l 
where  f{^)  =  -wmcf — — ^,  • 

The  most  probable  value  of  N  is  that  which  makes  f{J^Pif 
as  great  as  possible. 

Suppose   first  that  all   possible  values  of  N  are   d  priori 
equally  probable,  so  that  pj^^  is  independent  of  N.    Then  the 
most  probable  value  of  N  satisfies  the  inequalities 
f{N)>f(N-l),    f(N)>/{N+l). 
which  gives,  for  N,  the  greatest  integer  in  nm/rrii. 

Suppose  next  that  p^  oc  iV,  so  that  large  values  of  N  are 
a  priori  more  likely  than  small  ones.  The  most  probable  value 
of  N  is  then  given  by 

Nf{N)>(N-  l)f(N-  n    Nf(N)>{N  +  l)f(N+ 1), 
which  gives,  for  N,  the  greatest  integer  in  (n—  l)(m  —  l)/(mi  —  1). 

If  lastly  p^  X  j^ — -  ,  so  that  small  values  of  N  are  d  priori 

more  likely  than  large  ones,  the  inequalities  are 

f(N)     f{N-l)    f(N)     f(N+l) 

N  +  1  ^       N       '  iV^+1        N-\-2    ' 

giving,  for  iV,  the  greatest  integer  in 

{(m  +  1)  (n  +  1)  -  mi  -  2}/(mi  +  1). 

It  will  be  noticed  that,  so  long  as  m,  mi ,  n,  are  not  quite  small 

numbers,  the  three  different  suppositions  with  respect  to  p^ 

lead  to  results  in  close  agreement  with  each  other. 

IV.   A  and  B  play  a  game,  at  which  ^'s  a  priori  chance  of 

12           n—  2         n—  1 
winning  is  equally  likely  to  be  - ,  - ,  . . . , ,  or  .    Out 

of  a  set  of  a  +  6  games,  A  wins  a  and  loses  b.   What  is  the 
probable  value  of  A's  chance  of  winning  the  next  game  ? 

r 
If  ^'s  chance  of  winning  is  -  ,  the  probability  of  the  observed 

At' 

result  of  the  a  +  b  games  is 

(a  +  b)\  frY  f        r\^ 


a\b\     \nj    V        w/ 
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Hence  the  d  posteriori  probability,  that  ^'s  chance  of  winning 

r 
the  next  game  shall  be  - ,  is 


,     (1--) 

n/    \         nj 

n-i  /ry  r         rT^' 

1    \nj   V        n) 

and   the  probable   value   of  A's  chance   of  winning  the  next 
game  is 


?0     '^ 


n 


?6)('-9 

1  ^~^  /r\^  (         r\^ 
Now,  if  n  is  not  too  small,  the  quantity  -  S  ( -  j   (1 j 


is  very  nearly  equal  to 

a\b\ 


0)^(1-  xf  dx  = 

Jo 


(a  +  6  +  1)!* 

Hence,  if  n  is  large  enough,  the  required  result  is  very  nearly 
equal  to 

a+\ 
a+6  +  2* 

It  has  been  assumed  that  the  probability,  of  ^'s  chance  of 

r 
winning  being  measured  by  -  ,  is  itself  independent  of  r. 

Suppose  now  that  the  probability  of  j4's  chance  of  winning 

r    .                   .                r  (         r\ 
being   measured   by  - ,  is   proportional  to  -  ( 1 j ,  so  that 

neither  A  nor  B  is  extremely  likely  either  to  win  or  lose. 
Then  the  above  expression,  for  the  probable  value  of  ^'s  chance 
of  winning  the  next  game,  becomes 


n-1  /<y.\a+2  /  ^\6+i 

n) 


7  \nj       \        n) 
which  is  sensibly  equal  to 


a+2 
a  +  6  +  4" 
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It  is  to  be  noticed  that,  if  a  and  h  are  comparatively  large 
numbers,  this  result  differs  very  little  from  the  former  one.  In 
other  words,  if  the  result  of  a  sufficiently  large  number  of  games 
is  observed,  the  hypothesis  made  as  to  ^'s  a  priori  chance  of 
winning  has  but  little  effect  on  the  result.  This  is  obviously 
not  the  fact  when  the  number  of  games  observed  is  small. 

18.  V.  It  is  assumed  that,  when  a  calculator  adds  a  column 
of  integers,  the  probability  of  his  making  an  error  of  either  +  a 

or  -  a  units  is  ^^ ;  while  the  probability  of  his  getting  the 

correct  result  is  i .    The  results  of  twice  adding  a  given  column 
are  s^  and  Sj  (>  s^).    What  is  the  probable  value  of  the  sum  ? 

If  s  is  the  true  sum,  the  probability  p,  of  getting  s^  and  s^ 
for  the  sum  at  two  attempts,  is 

when  s  <  Si, 


s  =  s 


i» 


J. 


S2>  s  >  Si,         2*+*2-*i ' 

1 


So  =  s 


1 

^2  <   ^  '  24-4-1-S2+2S  * 

Hence,  if  as  is  the  a  priori  probability  that  the  sum  is  s,  the 
probable  value  of  the  sum  is 

^SpsOS 

%Ps<Ts  ' 
where  the  above  values  are  used  for  pg. 

Suppose  that  5  may  take  any  value  from  A  to  B,  and  that 
a  priori  all  these  values  are  equally  probable.  It  will  be 
assumed,  to  avoid  dealing  with  particular  cases,  that  A  is  less 
than  5i  and  B  greater  than  s^.  Then  the  probable  value  of 
the  sum  is 

B 

S  sps 

B 
^Ps 
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When  the  above  values  of  pg  are  used,  this  is  found  to  be 

5^  -  5i  4-  -U  -  22^-2*1  -  22*2-25 

If  A  is  very  much  smaller  than  Si  and  B  much  larger  than  s.,, 
this  probable  value  is  very  nearly 

i  (^1  +  52). 

But  if  5i  —  ^  and  B  —  So  are  small,  the  probable  value  differs 
sensibly  from  the  arithmetic  mean.  The  supposition  that  leads 
to  the  arithmetic  mean  as  the  probable  result,  viz.  that  the 
sum  to  be  found  is  equally  likely  to  have  every  value  in  a  long 
range,  does  not  appear  a  very  reasonable  one ;  indeed,  it  is  out 
of  the  question,  if  A  is  negative. 

It  is  also  to  be  noticed  that,  in  this  case,  there  is  no  most 
probable  value.  Assuming  that  the  sum  is  equally  likely  d 
priori  to  take  any  value  from  A  to  B,  the  d  posteriori  proba- 
bility that  the  sum  is  s^  is  the  same  as  that  the  sum  is  53,  and  is 
greater  than  the  probability  that  the  sum  has  any  other  value. 

It  is  interesting  to  compare  this  result  with  those,  obtained  by 
making  other  assumptions  about  the  accuracy  of  the  calculator. 
If  the  probability  of  his  making  an  error  a  were  ke~^^,  then  in 
the  above  calculation 

If  it  is  still  assumed  that  the  sum  is  equally  likely  to  take 
all  values  from  A  to  B,  the  probable  value  of  the  sum  is 

1  se        ^        '-    ^ 


.  -..p-^^y 


Z  e 

s=A 


A  small  value  of  h  implies  considerable  inaccuracy  on  the 
part  of  the  calculator.  If  neither  A,  5  —  J  (5i  +  ^2),  nor  J  (s^ + ^2)  —  ^ , 
be  quite  small,  it  is  easy  to  see  that  this  fraction  is  nearly 
equal  to  ^(81  +  82),  independently  of  the  actual  values  of  A  and 
B.  Moreover,  in  this  case,  the  most  probable  value  of  the  sum 
is  clearly  the  arithmetic  mean  of  Si  and  Sg- 
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With  the  second  assumption  about  the  calculator's  errors, 
the  results,  concerning  the  probable  and  most  probable  values 
of  the  sum,  are  more  definite  and  less  dependent  on  the  a  priori 
probability  of  a  given  sum  than  with  the  first. 

19.  VI.  An  observer  watches  the  spinning  of  a  coin,  and  notes 
the  sequences  of  heads  and  tails.  What  is  the  probable  number 
of  spins,  that  have  occurred,  when  he  has  noted  M  sequences  ? 

The  number,  N,  of  spins  must  be  equal  to  or  greater  than  M. 
On  the  supposition  that  the  number  of  spins  is  N,  the  proba- 
bility of  the  observed  events  is 

(^-1)!  1 

Hence,  if  the  a  priori  probability  that  the  number  of  spins  is 
N  be  represented  by  p^,  the  probable  number  of  spins  is 

^       Nl         1 

(jV-il/)  12^-1^^ 
^  {N-l)l      I         ' 

On  the  assumption  that  all  numbers  of  spins  equal  to  or 
exceeding  N  are  d  priori  equally  probable,  this  is 

M+1  1      (if+l)(i/-f  2)  1^ 
..  1      2"^  1.2  22^" 


,      Ml     M{M +1)1 

14- 1 ^^ h  . . . 

^12  1.2       2^ 


Q  _  1\-M-1 

Moreover,  on  the  same  assumption,  the  most  probable  value 
of  N  is  2M, 

Now,  in  this  question,  it  is  not  a  reasonable  assumption  that 
all  values  of  iV  above  M  are  equally  probable.  The  spinning 
must  take  time;  and  for  this  reason  there  must  be  an  upper 
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limit  to  N.    If  it  is  assumed  that  all  values  of  N  from  31  to  M' 
are  equally  probable,  the  probable  value  of  N  is 

where 

1      2  2!  22 

(ir+l)(ilf+2)...if'      1 


(M'-ilf)!  2^'-^' 

^     ^      il/1     if(if+l)l  M(M-{-l)...{M'-l)     1 


12  2!         22 (if' -i/)!  2^^'-^' 

so  that 
,      „     1      M+11  (M+l)(M-h2)...(M'-l)      1 


2  1      2^ (i/'-il/-l)!  2^'- 

1  \         (M+l){M+2)...M'      1 


M 


2  (  {M'-M)\  2^'-M 

Hence  the  probable  value  of  N  is 

r      (iif+i)(if+2)...iif-    i__  2 

I  (i/'-i/)!  2^'-^ -25 

This  is  always  less  than  2M. 

It  has  been  seen  above  that,  when  iV^  is  large,  the  probable 
number  of  sequences  in  N  spins  is  |iV,  the  duration  of  the 
spins  not  affecting  the  question.  When  however  a  number  of 
M  sequences  are  observed,  and  the  corresponding  probable 
number  of  spins  is  to  be  determined,  the  question  of  duration 
does  affect  the  question,  and  the  probable  number  of  spins  is 
less  than  2M. 

VII.  There  are  M  counters,  marked  from  1  to  M,  in  a  bag ; 
and  one  is  drawn,  each  being  equally  likely  to  be  taken.  The 
counter  marked  N  is  drawn,  and  a  coin  equally  likely  to  fall 
head  or  tail  is  spun  2N  times;  and  the  excess  2n^  of  heads  over 
tails  is  noted.  This  is  repeated  s  times,  2N  spins  being  made 
each  time;  and  the  excesses  of  heads  over  tails  are  found  to  be 

Wj,  7Z2,  •  •  •  J   fig. 

The  whole  proceeding  with  the  numbers  M,  n^,  n,^^  ...,  n«,  is 
reported  to  a  calculator,  the  number  N  only  being  withheld 
from  him.    What  conclusions  can  he  draw  about  N  ? 
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The  d  priori  probability,  that  N  has  any  given  value  from 
1  to  M,  is  1/M.  If  I  rii  I  is  the  greatest  of  the  positive  numbers 
|ni|,  I  rial,  ...,  \ns\y  the  probability  of  the  observed  set  of 
excesses  of  heads  over  tails  is  zero,  when  N  <  \n^\,  and  is 

((2^)0--   1_ 

when  iV  ^1  ni  |. 

The  approximate  value  of  this  latter  expression  is 

If  then  N>\ni\,  the  calculator  infers  that  the  probability, 
that  the  counter  drawn  was  marked  iV,  is 


^SI2 


^     1        -IJ 


where  a-^%y^. 

1 

<r     « 

The  most  probable  value  of  N  is  that  which  makes  e     ^/N^ 

as  great  as  possible.    The  maximum  value  of  this  quantity, 

when  N  varies  continuously,  is  given  by 

s 
so  that  the  most  probable  value  of  N  is  one  of  the  integers  on 
either  side  of  2a/s. 

<r        s 

Since  e  ^/N^ ,  when  sensible  in  value,  changes  little  when  N 
is  changed  to  iV+1,  the  probability  that  iV  lies  between  iVj 
and  ^3  may  be  written  approximately 


/ 


N, 


<T 


N    \    ""dN 


Putting  o-  =  Nx, 

<r      s 

this  IS  X        c^dx. 


CHAPTER  VI 

PROBABILITIES  CONNECTED  WITH 
GEOMETRICAL  QUESTIONS 

20.  In  Chapter  ii,  a  problem  was  discussed  (p.  22)  in  connec- 
tion with  the  position  of  points  on  a  line.  The  line  was  divided 
into  n  equal  parts ;  and  it  was  assumed  that,  when  a  point  was 
marked  on  it,  the  point  was  as  likely  to  be  in  any  one  part  as 
in  any  other.  It  followed  that  the  chance  of  the  point  being  on 
one  particular  part  of  the  line  is  1/n.  Suppose  that  AB  is  the 
line;  let  P  and  Q  be  two  particular  points  on  it,  of  which  P  lies 
between  the  ^th  and  (p  +  l)th  points  of  division,  while  Q  lies 
between  the  qth  and  (q  +  l)th.  The  segment  PQ  of  the  line 
includes  q  —  p  complete  parts  of  the  line  and  portions  of  two 
others.  The  probability,  that  a  marked  point  lies  on  the  q  —  p 
complete  parts,  is  {q—p)ln.  The  probability,  that  the  marked 
point  lies  on  PQ,  is  therefore  equal  to  or  greater  than  {q  -  p)ln. 
It  was  shewn,  in  the  same  way,  to  be  equal  to  or  less  than 
{q  —  p-\-  2)/?i.  Beyond  this  it  is  impossible  to  go  without  further 
data.  If  11  is  large,  the  probability  that  the  marked  point  lies  on 
PQ  is  known  between  narrow  limits ;  and  as  n  is  made  larger, 
both  {q—p)ln  and  (q  —  p -h  2)/n  approach  the  same  value, 
viz.  PQ/AB. 

Hence  the  supposition  that,  when  a  line  is  divided  into 
n  equal  parts,  a  marked  point  is  as  likely  to  lie  in  any  one 
part  as  in  any  other,  whatever  number  n  may  be  ;  and  the 
supposition,  that  the  probability  of  a  marked  point  lying  on 
any  particular  segment  of  the  line  is  equal  to  the  length  of  the 
segment  divided  by  the  length  of  the  line;  are  equivalent  to 
each  other. 

Either  supposition  is  often  expressed  in  the  form,  that  all 
positions  of  the  point  are  equally  probable.  It  should  be  noticed 
that  this  does  not  involve  all  coordinates  of  the  point  being 
equally  probable,  for  the  coordinate  which  defines  the  position 
of  a  point  may  be  chosen  in  a  variety  of  ways.    For  instance, 


F  B 
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the  position  of  P  on  ^i^  may  be  defined  by  x,  the  length  AP, 
or  by  y  the  ratio  APjPB.  In  terms  of  x,  the  probability  that 
the  point  lies  on  Pi  P^  is 

•X'2  "~"  *Ju\ 

aW' 

In  terms  of  y,  the  same  probability  is 

(1+2/0(1+2/2)' 

All  values  of  x  between  0  and  AB  may  be  described  as 
equally  likely;  but  all  values  of  y  cannot  be  so  described. 

Assuming  that  the  probability  of  a  marked  point  on  AB 
lying  in  the  segment  AP  has  a  definite  meaning,  it  must 
depend  on  the  position  of  P,  i.e.  it  must  be  a  function  of  x, 
if  AP  =  X.  Denote  it  by  f(x).  Then  f{x)  is  necessarily  a 
function  for  which  /(x.^)  —f{Xj)  >  0,  if  a^a  —  a?i  >  0;  for  if  P.  lies 
between  Pi  and  B,  the  probability  that  a  point  lies  in  AP.2 
cannot  be  less  than  the  probability  that  it  lies  in  APi.  Suppose 
that/(^)  is  discontinuous  at  x  =  Xo,  so  that 

f(xo  +  a)-f(x,-^)^k, 
however  small  a  and  /S  may  be.    If 

AC=Xo,     AG'=Xo-0,     AC"  =  Xo  +  a, 

then  the  probability  that  the  point  lies  on  the  segment  C'C 
is  equal  to  or  greater  than  k,  whatever  points  G\  G"  may  be  to 
the  left  and  right  of  G  respectively.  This  clearly  implies  that 
there  is  a  finite  probability  that  the  marked  point  has  the 
particular  position  (7.  Hence,  if  there  are  no  particular  points 
on  AB  of  this  nature,  f{x)  must  be  a  continuous  function. 
Assuming  further  that  f{x)  has  a  differential  coefficient,  the 
probability  that  a  marked  point  lies  on  a  segment  hx  of  the 
line,  when  hx  is  small  enough,  \^  f  {x)hx',  and  the  fact,  that 
the  point  must  be  somewhere  between  A  and  B,  is  given  by 
the  condition 

'AB 

f{x)dx^l. 


I 

Jo 


Conversely,  if  it  is  assumed  that  the  probability  of  a  point 
lying  on  a  sufficiently  small  segment  Sx  of  a  line  is  proportional 
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to  F{x)hx,  where  x  is  the  distance  from  one  end,  then  the 
actual  probability  is 

F{x)hx 

I  ' 

F  {x)  dx 

0 

where  I  is  the  length  of  the  line. 

21.  A  precisely  similar  method  may  be  used  with  respect  to 
a  point  marked  on  a  plane  area.  If  x  and  y  are  rectangular  co- 
ordinates in  the  plane  of  the  area,  and  if  it  is  known  that  there 
are  no  particular  points  on  the  area  such  that  the  probability 
of  the  marked  point  coinciding  with  one  of  them  is  finite,  then 
when  Sx  and  Sy  are  small  enough,  the  probability  of  the  marked 
point  lying  in  the  rectangle  bounded  by  x,  y,  x  -\-  8x,  y  +  3y, 
may  be  denoted  by 

f{x,  y)hxhy, 

subject  to  the  condition 


ii 


f{x,  y)dxdy=l, 

where  the  integral  extends  over  the  area  within  which  the  point 
is  known  to  lie. 

In  particular,  if  A  is  the  area,  and  if /(^,  y)  is  a  constant, 
then  the  probability  is 

BxSy 

and  all  positions  of  the  marked  point  are  said  to  be  equally 
likely. 

More  generally,  \i x^,  Xo,  ...,  x^  are  n  independent  quantities 
continuously  varying  over  a  certain  range,  and  if  the  probability 
of  their  having  values  confined  to  some  smaller  range  has 
a  definite  meaning,  then  when  hx^,  Bx^,  ...  are  small  enough, 
the  probability  of  their  having  a  system  of  values  lying  between 
Xx  and  ^1  +  5^1 ,  Xo  and  x^  +  8x2 ,  . . . ,  will  be  of  the  form 

(p  {^Xi ,  X.2,  . . . ,  Xji)  OXi  0X2  . . .  OX-n , 

subject  to  the  condition  that 

j  I  . . .  j  (j)  (x^,  Xo,  . . . ,  Xn)  dxj^ dx2 . . .  dxn  =  1, 

where  the  integral  is  extended   over  the  whole  range  of  the 
variables. 

5-2 


68 
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It  may  be  convenient,  for  purposes  of  calculation,  to  use  new 
variables  y^,  3/2,  . .. ,  yn,  functions  of  the  old.  The  method  of  the 
Integral  Calculus  shews  that 

\   \     .  .  ,    \    <P  \Xi  y   tZ?2  ,    .  .  .  ,    tPyi^  (tXi  (XX2  •  .  .   (tXf^  J 

extended  over  a  certain  range  of  the  xs,  which  expresses 
the  probability  that  the  x's  shall  have  values  within  that 
range,  becomes 

jj  ...  j  Dylr(y„  y^,  ...,  y^dy^dy^...  dyn. 


extended   over 

'^{yi,y^,'--.yn 

ys,  and 

the    corresponding   range    of   the   y's ;    where 
)  is  <^(^i,  X2,  ...,Xn)  expressed  in  terms  of  the 

D  = 

dxi        dx2            dXn 

93/1'      93/1'  "*'  ^3/1 
dxi       dx2            dxn 
dy-z'     dy^'  *"'  83/2 

• 

dxi        dXi            dxn 
^yn'     dyn'  '"'  dyn 

Hence  the  probability  that  the  y's  should  have  values  between 
3/1  and  3/1  +  %i,  3/2  and  y2  +  By.,  ...,  is 

J^i^iyi^y^y ..-, 3/n)S3/i%---^3/n. 
Keturning  to  the  case  of  two  independent  variables,  denote 
as  before  hyf(x,  y)hxhy  the  probability  that  the  variables  lie 
between  x  and  x  +  Bx,  y  and  y -h  By  respectively. 

If  the  range  of  the  variables  is  unlimited,  the  probability 
that  the  first  lies  between  x  and  a?  +  5^  is 


/•CO 

I      f(^^y)^^dy, 

J    —CO 


which  is  of  the  form  F  (x)  Bx,  F  (x)  being  a  function  of  x  only. 
Similarly,  if 

f(x,  y)dx=G(y), 


/: 


the  probability  that  the  second  variable  lies  between  y  and 
y-\-Byis  G(y)8y. 
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Suppose  now  that  it  is  known  that  the  probability  j9^,  that 
the  first  variable  lies  between  x  and  x  +  hx,  is  F{x)  hx,  while  the 
similar  probability  j9_B  ^^^  ^^^^  second  variable  is  G{y)hy.  We 
have  seen  that 

Pab  =PaPb  -^PabPa'b'-Pab'Pa'b', 

and  unless  PabPa'b'  —Pab'Pa'b  is  zero,  it  does  not  follow  that  the 
probability,  that  the  two  variables  lie  between  x  and  x-h  Bx,y  and 
y-¥Sy  respectively,  is  F(x)  G{y)  SxBy.  Now  the  assumed  data 
give  no  information  as  to  the  value  o^PabPab'  ~ Pab'Pa'b  5  ^^^ 
therefore,  from  the  assumed  data,  it  is  not  possible  to  deter- 
mine the  probability  that  the  two  variables  simultaneously  lie 
between  given  limits. 

The  same  is  obviously  true  when  the  range  of  the  variables 
is  limited.  A  similar  result  holds  when  the  number  of  variables 
exceeds  two. 

Illustrations. 

22.  In  the  following  illustrations  it  will  be  assumed,  unless 
the  opposite  is  stated,  that  all  positions  of  a  point  marked  on  a 
line  are  equally  likely. 

I.  A  point  is  marked  at  random  on  a  unit  line.  What  is  the 
probable  value  of  the  sum  of  the  squares  of  the  two  parts  into 
which  it  divides  the  line  ? 

The  probability,  that  the  distance  of  the  marked  point  from 
one  end  of  the  line  lies  between  x  and  x  +  Bx,  is  Bx.  The  corre- 
sponding value  of  the  sum  of  the  squares  of  the  two  parts  s  is 
given  by 

s=l-2x+2a)'. 

Hence  the  probable  value  of  s  is 

(l-2x  +  2x')dx==%. 


f 

Jo 


It  is  instructive  to  consider  this  simple  example  from  another 
point  of  view.   When  s  is  given,  there  are  two  values  of  x,  viz. 

^1  =  i  +  "^is  -  i,     X2  =  i-  Vjs-i. 
Hence        Bx^  =  J   ^^  ^     —  Bx2  =  ^ 


Vj73|'  ^     V^^r^' 
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and  if  the  sum  of  the  squares  of  the  parts  lies  between  s  and 
s  -H  8s,  the  point  must  lie  on  one  of  two  segments  of  the  line, 

each  of  which  is  of  length  J  The  probability  for  this 

Vis-i 

.    ,       Bs 

Hence  when  the  point  is  marked,  the  probability  that  the 

sum  of  the  squares  of  the  parts  lies  between  s  and  s  -\-  Bs  is 

Bs 
,  The  extreme  values  of  s  and  i  are  1.    Hence  the 

\/25-l 

probable  value  of  s  is 

Jh\/2s-l      '* 

II.  Two  points  are  marked  at  random  on  a  unit  line.  What  is 
the  probable  value  of  the  sum  of  the  squares  of  the  three  parts  ? 

The  probability,  that  the  distances  of  the  two  points  from 
one  end  of  the  line  lie  between  x  and  x-\-Bx,y  and  y  -\- By  re- 
spectively, is  BxBy.  If  y  is  less  than  x,  the  sum  of  the  squares 
is  3/2  +  (a?  — yy -f  (1  —  a;)-;  if  y  is  greater  than  x,  the  sum  is 
u(&^(^y  —  xf  +  {\—  yy.    Hence  the  probable  value  is 

f  dx  \  r  {f  +  (x^  yy  +  (1  -  xy}  dy 

J  0  \_J  0 

+  r{x^^  +  (y-a;y  +  {l-yy}dy 

III.  A  point  Pj  is  marked  at  random  on  a  unit  line  AB,  and 
then  a  point  P^  is  marked  at  random  on  P^B.  What  is  the 
probable  value  of  the  sum  of  the  squares  of  the  three  parts  ? 

The  probability  that  AP^  lies  between  x  and  x  +  Bx  is  Bx; 
and  the   probability  that  P1P2  lies  between  y  and  y  +  By  is 

^        Hence  the  required  probable  value  is 


1  -X 

C dx  f '"'  ^^  {a:^^f  +  (l-x-  yy]  =  f . 
Jo  0       i.  —  X 

IV.    When  n  —  1   points   are  marked   on  a  unit  line,   the}' 

divide  it  into  n  segments.    The  lengths  x^,  x^,  ...,  Xn-i,  of  ?i  —  1 

of  these  are  arbitrary,  subject  to  the  condition  that  their  sum 

does  not  exceed  unity.    The  question  suggests  itself:  What  is 

the  probability  that   the  n—1   segments  have  lengths   lying 

between  x^  and  x^^  +  Bx^,  x^  and  X2+  Bx^,  ...  respectively  ? 
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Denote  the  distances  of  the  n  —  1  points  of  division  from  one 
end  of  the  line  by  3/1,  y,,  •  •  • ,  yn-i-  When  the  points  are  assigned, 
the  segments  are  unaltered  for  any  permutation  of  the  points 
among  themselves.  For  one  particular  sequence  of  the  points, 
the  y's  will  be  in  ascending  order,  so  that 

3/1  =  ^1, 

i/2  =  ^1  -h  ^2) 


2/n— 1  —  Xi  -\-  X2  -r  • '  •  -T  ^n—i  • 

Since  there  are  (n  — 1)!  permutations  of  the  points,  the 
probability  that  the  segments  will  have  the  values  corre- 
sponding to  this  set  of  points  is  (n  —  1)!  By  181/2 ...  Byn-i- 

Now  the  Jacobian  D  of  the  above  set  of  equations  is  unity. 
Hence,  by  the  theorem  on  p.  68,  the  probability  that  the  n  —  1  seg- 
ments have  lengths  between  oc^  and  x^  +  Bx^,  x^  and  a^a  +  ^^2,  •  •  •  is 

{n—X)\Bx^Bx.2.,,  Bxn-i . 
This  agrees  with  the  fact  that 

ri  ri-^i  ri-xj-.T2-...-xn-i  1 

dXi  dx2...  dXn-i  =  -, ry:. 

Jo        Jo  Jo  {n—l)\ 

Thus,  if  a  unit  line  has  71  —  1  points  marked  at  random  on  it, 
the  probable  value  of  the  sum  of  the  squares  of  the  n  parts 
into  which  they  divide  the  line  is 

n         n-x,  ri-.r,-...-a-n-2  2 

(n  -  1) !  /    dxi  \        dxo...  I  Sdx^-i  = , 

.'oJo  Jo  n-\-\ 

where  S  denotes  x^^ -^x^^-  ...-\- x\_^  4- (1  _ ^^  _  _ .  _ x^^^Y. 

V.  It  is  interesting  to  notice  how  these  results  are  modified, 
when  all  positions  of  the  marked  points  are  not  equally  likely. 
Suppose  that  the  probability  of  the  distance  of  a  point  from 
one  end  of  the  unit  line  lying  between  x  and  x  +  Bx  is  pro- 
portional to  a;  (1  —  x)  Bx,  so  that  the  point  is  more  likely  to  lie 
in  the  central  part  of  the  line  than  at  the  ends.    Then,  since 


Jo 


x(l  —x)dx  =  ^, 


0 

the  probable  value  of  the  sum  of  the  squares  of  the  two  parts, 
into  which  a  point  divides  the  line,  is 

6  I  x{l-x){l-2x+2x'')dx  =  %. 
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As  would  be  expected,  this  is  less  than  when  all  positions  of  the 
point  are  equally  likely. 

VI.  From  the  point  of  view  of  the  present  chapter,  the  result 
of  Ex.  X  of  Chapter  ii  (p.  22)  may  be  expressed  as  follows: — If  n 
points  are  marked  on  a  unit  line  and  all  positions  are  equally 
likely  for  each  point,  the  probability  that  the  n  points  all  lie  on 
a  continuous  portion  of  the  line  of  length  x  is  nx^''^  —  (n-  l)x^. 

The  problem  takes  a  rather  different  form  when  the  line, 
on  which  the  points  are  marked,  is  closed.  Suppose  points 
are  marked  on  a  closed  curve  of  unit  length :  and  assume 
that  the  probability,  that  a  marked  point  lies  on  a  given 
continuous  segment  of  the  curve  of  length  I,  is  equal  to  I;  i.e. 
in  the  sense  already  used,  that  all  positions  of  the  point  are 
equally  likely.  Then,  when  n  points  are  marked  on  the  curve, 
what  is  the  probability  that  some  continuous  segment  of  the 
curve  of  length  x  is  free  from  points  ?  Assign  a  positive  direction 
along  the  curve ;  and  starting  from  each  of  the  n  points,  lay 
off  a  length  1  —  a;  in  the  positive  direction.  If  a  portion  x  of 
the  curve  is  free  from  points,  the  n  points  must  all  lie  on 
one  of  these  n  segments  of  length  l~x.  The  probability,  that 
the  n  points  all  lie  on  a  particular  one  of  these  segments,  is 
(l-x)""-'. 

First,  let  x  be  greater  than  J.  Then,  if  the  points  lie  on  a 
particular  one  of  the  segments,  they  cannot  lie  on  any  other ; 
and  the  required  probability  is  n  (1  —  x)^~^. 

Next,  suppose  that  J  >  a;  >  J.  Two  segments  may  now  have  in 
common  a  part  of  total  length  1  —  2x,  which  consists  of  two  arcs 
starting  respectively  where  the  segments  start.  The  case,  in  which 
the  n  points  all  lie  on  this  portion  of  length  1  —  2a?,  has  been  taken 
into  account  twice,  once  with  each  of  the  segments.  Hence  when 
J>ic> J,  the  required  probability  is 

n  (1  -  xT-^  -  'i^-lil  (1  -  2xr-K 
There  is  no  difficulty  in  continuing  this  reasoning.  The  general 
result  is  that,  when  ->x  > ^ ,  the  required  probability  is 

s  {-iy+'  ,/—  T,(l-^^)''~'• 
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lOQf  71/ 

It  is  not  difficult  to  shew  that,  when  n  is  larg^e  and  x  is     - -, 

n 

this  result  is  roughly  '63,  and  that  it  diminishes  rapidly  for  larger 

values  of  x.  Hence,  when  a  large  number  7i  of  points  are  marked, 

the  probability  of  a  gap  between  them  materially  exceeding 

losn  .  „ 

— ^—  IS  very  small. 

n  '' 

VII.  The  problem  of  the  comparative  regularity  of  a  random 
distribution  of  points  on  a  closed  curve  may  be  looked  at  from 
another  point  of  view.  With  the  notation  already  used,  the 
probability  'p,  that  just  m  of  the  n  points  lie  on  a  continuous 
portion,  length  x,  of  the  unit  curve,  is  given  by 

p = — — x'^n-  xy-'"', 

m !  (?i  —  m) ! 

Putting  m  —  xn  +  yot, 

and  using  the  approximate  expressions  for  the  factorials,  it  is 
found  that,  when  terms  of  the  order  l/?i^  are  neglected, 

log p  =  -  i  log 2^nx (1  - ^) -     2^.^j_^-)  -  ; 
SO  that,  reintroducing  in, 

{vi~x{n  +  l)  +  h\^ 

(2x  -  If      - ^    \,     \^^ 

_  _A L_  g  2nx  (1  -  x) 

p  =  e    8'*^{i - '^)  . 

v27rrM;(l  —x) 

If  X  is  71"'^+'*,  where  a  is  positive,  the  first  factor  of  p  is  very 
nearly  unity  when  n  is  sufficiently  great.  Hence  p  may  be 
written 

e ^. 

VttTi 

and  the  probability,  that  the  number  of  points  on  a  segment  of 
length  ?i~^"*''*  lies  between  mj  and  m^,  is 

m-mi  VTrh 


Now  tables  already  quoted  shew  that 

io  +  2Vw       1  -'^ ^—^ 

mo 


o  /  ~     1  i^-fno)'- 

I    ^  —p==e         h      (im  = -995  nearly, 

Jmn-2jn  N  irh 
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and  the  sum  differs  very  little  from  the  integral.  Hence,  when 
n  is  large  enough,  the  probability  that  the  number  of  points  on 

a  a 

an  arc  of  length  ?z~^+"  lies  between  w"  4-  2\/2n^  and  ?i*  —  2\/2?i  is 
'995.  For  such  an  arc  then,  however  small  a  may  be,  the  proba- 
bility is  great  that  the  point-density  differs  very  little  from  its 
mean  value.  No  such  conclusion  can  be  drawn  for  an  arc  of 
length  ?2~\  as  is  obvious  from  the  nature  of  the  problem. 

23.  VIII.  It  will  be  assumed  in  what  follows  that,  when  a  point 

is  marked  on  a  unit  sphere,  all  positions  of  the  point  are  equally 

likely,  in  the  sense  that,  if  *S^  is  the  area  of  the  spherical  surface  on 

one  side  of  a  closed  curve  drawn  on  the  surface,  then  the  probability 

S 
that  a  marked  point  lies  on  that  side  of  the  curve  is  -j-  . 

(i)  Two  points  are  marked  on  a  unit  sphere.  What  is  the  proba- 
bility that  the  angular  distance  between  them  does  not  exceed  a  ? 

If  A  is  one  of  the  points,  and  a  small  circle  of  radius  a  is 
described  with  A  as  centre,  its  area  is  27r  (1  —  cos  a).  Now,  if  the 
distance  between  the  two  points  does  not  exceed  a,  the  second 
point  must  lie  either  on  the  small  circle  or  on  the  same  side 
of  it  as  A.    Hence  the  required  probability  is  J(l  —  cos  a). 

It  follows  at  once  that  the  probability,  that  the  distance 
between  two  points  marked  on  the  sphere  lies  between  a  and 
a-f  Sa,  is  isinaSa. 

(ii)  Three  points  are  marked  on  a  unit  sphere.  What  is  the 
probability  that  there  is  a  small  circle  of  radius  a(<  Jtt),  on  or 
within  which  all  three  points  lie? 

Let  PQR,  PQS  be  two  small  circles  on  the  sphere  of  radius  a. 
With  P  as  centre  and  radius  2a,  describe  the  arc  R^TiS^  of  a 
small  circle,  touching  the  above  circles  in  R^,  Si',  and  with  Q  as 
centre  and  radius  2a,  describe  the  arc  R^T^S^  touching  the  circles 
in  i^2j*^2-  Then  an  inspection  of  the  figure  shews  at  once  that 
any  point  U  within  the  closed  curve  RR^T^SiSSoT^RoR  is 
such  that  P,  Q,  U  lie  within  a  small  circle  of  radius  a;  while 
if  U  is  without  this  closed  curve,  P,  Q,  U,  do  not  lie  within  any 
small  circle  of  radius  a.   Let  0,  0'  be  the  centres  of  PQR,  PQS. 
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Then 

area  RR/r,S,SS,T,R,R  =  2  area  FOR,  T^OP 

+  2  aresi  OR,RR,0 
-area  POQO' P. 

If  the  angles  OPO'  and  POQ  are  /3  and  7,  then 
area  POR,T,S,OP  =  0{l-  cos  2a), 
area  ORfiRjO         =7(1  —  cos  a), 

area  POQO'P  =  2/9  +  27  -  27r. 

Hence  the  probability,  that  £/"  lies  within  RRiT^S^SS2Tc>R^R,  is 

27r  —  2^  cos  2a  —  27  cos  a 
47r 

If  the  distance  PPi  is  0,  it  has  bef  n  seen  that  the  probability 
of  the  distance  of  two  points  on  the  sphere  being  between  6  and 
6  -\-  hO  i^  ^^va.  6h6.  Hence  the  probability,  that  there  is  a  small 
circle  of  radius  a  containing  the  three  points,  is 


1    f2* 

sin  ^  (tt  — /3  cos  2a  —  7  cos  a)  cZ^, 

^TT  J  0 


while  from  the  spherical  quadrilateral  OPOP, 

cos  -^  =  cot  a  tan  - ,     cos  ^  =  sin  ^  cos  ^  . 

The  integral  is  readily  evaluated ;  and  the  required  probability 
is  found  to  be 

(1  —  cos  of  (1  4-  f  cos  a). 

This  is  unity  when  a  =  -Jtt,  as  it  obviously  should  be. 

From  the  result  it  follows  that,  when  three  points  are  marked 
on  a  unit  sphere,  the  probability,  that  the  radius  of  the  small 
circle  through  them  lies  between  a  and  a  +  3of,  is 

I  sin  a  (1  —  cosci)(l  +  5cosa)Sa. 

(iii)  Three  points  A,  B,  C  are  marked  on  a  unit  sphere,  all 
positions  of  each  being  equally  likely.  They  are  joined  by  the 
shorter  arcs  of  the  great  circles  BG,  CA,  and  AB.  What  is  the 
probable  area  of  the  spherical  triangle  ABC  so  formed  ? 
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The  probability,  that  the  arc  BG  lies  between  a  and  a  +  da,  is 
J  sin  ac^a,  a  lying  between  0  and  tt.  Similarly  the  probability, 
that  the  arc  GA  lies  between  h  and  b  +  db,  is  ^  sin  bdb.  The 
angle  A  GB  lies  between  0  and  tt  ;  and  the  probability,  that  it 

lies  between  G  and  G  -\-dG,  is  —  dG. 

IT 

Hence,  when  all  positions  of  each  of  the  three  points  are 
equally  likely,  the  probability  that  the  elements  BG,  GA,  and 
angle  A  GB  lie  respectively  between  a  and  a  +  da,  b  and  b  +  db, 
G  and  G  +  dG,  is 

z —  sin  a  sin  b  da  db  dG. 
47r 

Now  the  area  of  the  spherical  triangle  is  A  +B  +  G^7r. 

Hence  the  required  probable  value  of  the  area  is 

1    f"^  f"^  f"" 

(A  +B  +  G  —  7r)mn  a  sin  b  da  dbdG. 
Jo 


4t7r 


0  .'  0  J  0 


Now  l(A+B)dG  =  (A-hB)G-  \G(dA-\-dB). 

From  the  trigonometry  of  the  spherical  triangle, 
,  /  A       Ti\     COS  Ua  —  b)     ^  ,  ^ 

thus 

T  .       ,  Ti  cos  a  +  cos  b  ,^ 

dA-\-dB  =  - -J -. ^-r ^dG, 

1  +  cos  a  cos  0  4-  sin  a  sm  o  cos  u 

so  that 

(A+B)G~^^~' 


r(A+B)dG== 
Jo 


+ 


_c=o 

G  (cos  a  +  cos  b)  dG 


0  1  +  cos  a  cos  b  +  sin  a  sin  b  cos  (7  * 
Now  it  is  clear  from  a  figure  that,  when  C  =  tt,  ^  +  5  is  0  or 
27r,  according  as  a  +  6  is  less  or  greater  than  tt. 
Further,  it  is  easy  to  verify  that 

(cos  a  +  cos  b)  sin  a  sin  bdadb 


r  f'^  (co 
Jo  Jo  iT 


=  0. 


cos  a  cos  6  -f  sin  a  sin  6  cos  G 
Hence 

/*ir    rir  rir  /*»    /-jr 

I  (-4  +  J5)sinasin6dIac?6c?C=  I     I   k.sinasinbdadb, 

J 0  Jo  Jo  Jo  Jo 

where  A;  is  0  if  a  +  6  <  tt,  k==  ^tt^  if  a  +  6  >  tt  :  so  the  integral 

=  47r2. 
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/•«•    rn   j*Tr 

Also  (G  — tt)  sin  a  sin  bdadbdC=  27r\ 

Jo  J 0  Jo 

Hence  the  required  probable  value  of  the  area  is  |7r. 

In  a  similar  way  the  probability,  that  the  area  of  the  triangle 
should  lie  between  S  and  S  +  dS,  may  be  determined. 

24.  IX.  The  position  of  a  point  on  a  sphere  is  determined  by 
two  angles  6  and  </>,  its  co-latitude  and  longitude  measured  from  a 
given  pole  and  a  given  meridian.  The  position  of  a  figure,  of 
given  shape,  on  a  sphere  may  be  determined  as  follows.  Let 
A,  B  be  two  marked  points  of  the  figure.  Denote  by  6  and  (/>  the 
co-latitude  and  longitude  of  A  from  a  pole  0,  and  by  y^  the 
angle  GAB.  Then  the  position  of  the  figure  is  completely 
determined  by  the  three  angles  0,  (f),^jr^,  and  all  positions  are 
given  by  values  of  these  angles  lying  respectively  between  0 
and  TT,  0  and  27r,  0  and  27r.  The  probability,  that  the  figure  has 
a  position  in  which  the  three  angles  lie  between  6  and  6  +  BO, 
<f>  and  <f>  -H  Sc^,  y\r  and  ^\r  +  S>/r,  will  be  of  the  form 

F{e,(f>,yir)Bdh4>h'f. 
Now  an  element  of  area  of  the  sphere  surrounding  A  is 

sin  eseB(i>. 

Hence  the  probability,  that  A  lies  in  the  element  of  area  BS, 
and  GAB  lies  between  yjr  and  i/r  -f-  Byjr,  is 

siir  e     ^^^^- 

When  F(6,  <p,  yjr)/ sin  ^  is  a  constant,  all  positions  of  the  figure 
are  said  to  be  equally  probable;  and  the  constant  is  I/Stt',  since 

\    dd        d(f>\      dylr.F(d,(t>,ylr)=^l. 

Jo       Jo         Jo 

Suppose  two  curves  of  arbitrary  shape,  but  of  finite  length, 
are  drawn  on  the  sphere:  and  that  when  displaced,  without 
change  of  shape,  all  positions  of  each  are  equally  probable.  What 
is  the  probable  number  of  their  intersections?  This  question  is 
due  to  M.  Poincare  * ;  but  the  solution  given  here  is  somewhat 
different  from  his. 

Whatever  positions  are  given  to  the  two  curves,  one  of  them 
may,  by  a  rotation  of  the  sphere   as   a  whole  carrying  the 

*  Galcul  des  Probabilites,  p.  122. 
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curves  with  it,  be  brought  into  a  standard  position:  so  that 
the  generality  of  the  question  is  not  affected,  by  supposing  one 
of  the  curves  fixed  and  the  other  equally  likely  to  take  any 
position  with  respect  to  it. 

First,  suppose  the  two  curves  to  be  arcs  of  great  circles  sub- 
tending angles  a  and  /3  at  the  centre,  and  suppose  that  either  a  or  /Q 
is  less  than  tt  ;  so  that,  if  the  curves  intersect  at  all,  they  can  only 
intersect  once.  The  probability  of  their  intersection  depends  on 
a,  ff  only;  it  may  be  denoted  by  f(a,  /3),  where  the  function  is 
symmetric  in  its  two  arguments.  Mark  a  point  G  on  AB,  the 
a  arc,  dividing  it  into  two  arcs  AG,  GB,  lengths  «!  and  a^.  If  the 
y8  arc  intersects  AB  a,t  all,  it  must  either  intersect  AG  or  GB, 
and  it  cannot  intersect  both.    Hence 

/(ai  +  «2,/3)  =/(«!, /3)+/(«„y8). 
Similarly  f{a,  ft  +  /5,)  =/(«,  A)  +/(«,  ^2). 

From  these,  it  follows  that 

f(a,l3)  =  ka/3, 
where  A;  is  a  constant. 

Suppose,  next,  that  the  fixed  curve  consists  of  two  arcs  of  great 
circles,  subtending  a  and  ^  at  the  centre,  where  a  and  /3  are  both 
less  than  tt,  and  that  the  other  curve  is  an  arc  7  (<  tt)  of  a  great 
circle.  In  this  case,  it  is  possible  for  the  7  arc  to  intersect  both  the 
a  arc  and  the  /9  arc.  Denote  the  probability  of  this  hyp.  Then  the 
probability  that  the  7  arc  intersects  the  a  arc  only  is  kay  —p ;  and 
the  probability  that  it  intersects  the  /S  arc  only  is  k^y  —  ^9;  so  that 
the  probability,  that  the  7  arc  intersects  the  fixed  curve  at  least 
once,  is  k  (a  +  /B)  y  —  p.  On  the  other  hand,  the  probable  number 
of  intersections  of  the  fixed  curve  and  the  moving  curve  is 

l{koiy-p)  +  l(  k/Sy  -  p) -\- 2p  =  k  (a -h /S)  y. 
It  may  be  noticed  that  in  the  first  case,  where  the  curves  can 
only  intersect   once,  the  probability  of  intersecting  and  the 
probable  number  of  intersections  are  the  same  thing. 

There  is  clearly  no  difficulty  in  extending  this  result  to  the 
case,  in  which  both  fixed  and  moving  curves  consist  of  any 
number  of  small  (<7r)  arcs  of  great  circles.  The  result,  if  one 
curve  consists  of  arcs  Wi ,  ag,  . . . ,  o^^,  of  great  circles,  and  the  second 
of  arcs  /3i,  ySg,  . . . ,  /^n,  is  to  give 
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for  the  probable  number  of  intersections,  i.e. 

klV, 

where  I  is  the  whole  length  of  one  curve  and  I'  that  of  the  other. 
Now  whatever  the  curves  may  be,  points  may  be  marked  on  them 
dividing  them  into  arcs  which  are  ultimately  arcs  of  great  circles. 
The  result  is  therefore  general. 

The  constant  k  is  determined  at  once,  by  considering  the  case 
of  two  great  circles.  Here  l  =  V  =  27r,  and  the  number  of  points 
is  2  ;  so  that 

A;(27ry  =  2. 

II' 
Hence  the  required  result  is  ^r—^ . 


CHAPTER  VII 
THEORY  OF  ERRORS 

25.  Practically  all  magnitude-determinations  are  liable  to 
error.  It  is  true  that,  if  a  basketful  of  apples  is  spread  out  on  a 
table  one  can  determine  with  certainty  the  number  of  apples; 
but  when  larger  and  larger  collections  of  distinct  objects  are  dealt 
with,  a  stage  must  be  reached  at  which  it  is  no  longer  possible 
to  be  certain  of  the  result  of  counting,  if  only  because  of  the 
length  of  time  the  process  takes. 

In  general,  a  magnitude-determination  cannot  be  reduced 
directly  to  a  process  of  counting.  It  nearly  always  involves  the 
observation  of  certain  coincidences,  such  as  that  of  a  pointer 
with  a  division  on  a  scale. 

The  imperfections,  both  of  our  senses  and  of  the  instruments 
used,  necessarily  imply  then  an  uncertainty  as  to  the  result  of 
the  determination ;  and  if  several  determinations  of  the  same 
magnitude  are  made,  they  will  be  in  general  different  from 
each  other. 

The  question  then  arises,  if  a  number  of  determinations 
have  given 

d] ,  dg  >   •  •  • }  (^n  > 

as  the  values  of  a  certain  magnitude,  what  use  can  be  made 
of  this  result  ? 

If  the  a's  are  given,  without  any  indication  of  the  way  in 
which  they  were  arrived  at,  a  definite  result  can  be  obtained 
only  by  making  some  more  or  less  arbitrary  assumptions. 
Under  such  circumstances,  the  actual  value  chosen  for  the 
magnitude  is,  in  general,  the  arithmetic  mean  of  the  results,  viz. 

-  (Oi  4-  0^2  +  •  •  •  +  ««)• 
n 

If  a  were  the  true  value  of  the  magnitude,  the  errors  implied 
in  the  data,  reckoned  positive  when  in  excess,  would  be 

cti-a,         (i  =  1,  2,  ...,  n). 
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The  algebraic  sum  of  the  errors  is 

n 

^  (di  -  a) ; 
and  the  sum  of  the  squares  of  the  errors  is 

i=l 

If,  in  the  last  expression,  a  is  regarded  as  a  variable,  its  least 
value  is  given  by 

n 

S  (di  -  a)  =  0 ; 

i.e.,  a  is  the  arithmetic  mean.  Hence  a  choice,  of  the  arithmetic 
mean  of  the  a's  as  the  true  value,  implies  that  the  algebraic 
sum  of  the  errors  is  zero,  and  that  the  sum  of  the  squares  of 
the  errors  is  as  small  as  possible.  If  a  value  in  excess  of  the 
arithmetic  mean  is  taken,  it  is  implied  that  negative  errors 
are  more  numerous,  or  greater,  than  positive  errors;  and  if 
a  value  less  than  the  arithmetic  mean  is  taken,  it  is  implied 
that  the  positive  errors  exceed  the  negative. 

Conversely,  the  assumption  that  the  sum  of  the  squares  of 
the  errors  is  as  small  as  possible,  leads  to  the  arithmetic  mean 
as  the  required  value. 

This  however  is  only  one  of  a  great  variety  of  assumptions 
that  might  be  made  with  respect  to  the  errors.  In  general, 
each  assumption  will  give  a  different  value  for  the  magnitude. 

26.  I.  For  instance,  it  might  be  assumed  that  the  sum  of  the 
absolute  values  of  the  errors,  that  is,  their  values  apart  from  sign, 
is  as  small  as  possible.  If  the  n  given  values  are  taken  to  be 
in  ascending  order  of  magnitude,  and  the  true  value  a  is  assumed 
to  lie  between  a^  and  a^+i,  the  sum  of  the  absolute  values 
of  the  errors  is 

n  r 

(2r  -  n)  a  -f   S  a^  —  %  ai. 

r+\  1 

If  2r  >  71,  this  is  least  when  a  =  a^ ;  and  if  2r  <  n,  it  is  least 
when  a  =  a^+j*   If  2r  =  n,  it  is  independent  of  a. 

Hence,  if  n  is  odd  and  equal  to  2m  +  1,  the  sum  of  the  absolute 
values  of  the  errors  is  least  when  a  =  a^ ;  while  if  n  is  even  and 
equal  to  2m,  the  sum  of  the  absolute  values  of  the  errors  is 

FB  6 
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constant  for  all  values  of  a  from  a^  to  a-m+i,  and  this  value  is 
less  than  any  other. 

II.  Again,  if  it  is  assumed  that  the  sum  of  the  2mth  powers 
of  the  errors  is  as  small  as  possible,  a  will  be  given  by 

n 

S  (ai  -  a)^-^  =  0. 
1 

If  «!  and  an  are  the  least  and  the  greatest  of  the  o-'s,  and  rri  is 
sufficiently  great,  every  term  in  this  equation  is  very  small 
compared  to  either  {a^  -  oif^-'^  or  (a^  —  af'^~^.  Hence  approxi- 
mately 

a  =  i  (ai  +  an). 

It  may  be  inferred  chat  the  assumption,  that  the  sum  of  the 
2mth  powers  of  the  errors  is  as  small  as  possible,  when  m  >  1, 
makes  the  determination  of  the  true  value  depend  more  on  the 
larger  and  the  smaller  observed  values  than  on  those  in  the 
middle  of  the  series. 

In  the  complete  absence  of  any  information  as  to  how  the 
a's  were  arrived  at,  it  would  always  then  seem  most  reasonable 
to  take  the  magnitude  equal  to  the  arithmetic  mean  of  the  a's, 
as  any  other  assumption  would  imply  that  either  an  excess  of 
positive  errors  or  an  excess  of  negative  errors  has  occurred. 

If  the  set  of  n  results  are  divided  arbitrarily  into  two  sets  of 
r  and  n  —  r,  and  if 

Y  f  I     «=» 

-  2  cii  =  «! ,     2  ai  =  a2, 

ri  n-^Vr+i 

(it  is  no  longer  supposed  that  the  as  are  in  order  of  magnitude), 

then 

1  ^         rtti  +  (n  —  r)ou 

-z,ai= =. 

n  I  n 

The  quantity  on  the  right  is  the  weighted  mean  of  Wj  and  Wa, 
attaching  weights  r  and  n—r  to  them,  where  r  is  the  number 
of  observations  giving  a^  and  n  —  r  is  the  number  giving  a^. 

Suppose  now  that  two  sets  of  determinations  of  a  magnitude 
are  given,  viz. 

ai,  a^,  '",  dm,} 

^1  )    ^2  >    •  •  •  >    ^n  5 
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and  that  the  only  thing  known  about  them  is,  that  the  first  set 
is  obtained  by  better  methods  or  by  more  accurate  observers 
than  the  second.  The  first  set,  about  which  by  themselves 
nothing  is  known,  would  give,  by  the  rule  of  the  arithmetic 
mean, 

]^    m 

«!  =  —  2ai; 
in  1 

and  similarly  the  second  set  would  give 

1  ^ 

n  1 

for  the  value  required.  In  deducing  the  final  result  from  «! 
and  02,  if  nothing  were  known  about  them  but  the  numbers  of 
observations  from  which  they  were  derived,  weights  proportional 
to  these  numbers  would  be  used.  When  however  it  is  known 
that  the  as  have  been  obtained  by  better  methods  than  the  6's, 
it  is  reasonable  to  attach  a  greater  weight  to  «!  in  consequence 
of  this  knowledge,  so  that  the  final  result  would  be 

krnai  +  naj 
/cm  +  n 

where  k  is  greater  than  unity.  Until  k  is  known,  all  that 
can  be  said  from  this  formula  is  that  the  required  value  lies 
between 

Oi  and . 

VI  +  n 

What  is  always  wanted  practically  is  a  definite  result ;  and 
this  can  only  be  obtained  by  giving  k  a  definite  numerical  value. 
The  mere  statement  then  that  the  first  set  of  values  are 
obtained  by  a  better  method  than  the  second  is  of  little 
practical  use. 

What  is  necessary  to  get  a  definite  result  is  a  statement 
of  the  relative  weights  to  be  attached  respectively  to  any  one 
of  the  first  and  any  one  of  the  second  sets  of  values.  This 
clearly  cannot  be  reduced  to  a  rule,  but  must  be  a  matter  of 
judgment  in  each  particular  case. 

Even  if  it  is  known  that  the  sets  of  determinations  of  a 
magnitude  have  been  arrived  at  by  the  use  of  the  same  method 
and  by  equally  accurate  observers,  the  problem  of  obtaining 
a  definite  result  from  them  remains  largely  indeterminate.    If, 

6-2 
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however,  it  is  known  that  the  observations  made  are  liable  to 
error  following  some  definite  and  known  law,  the  problem  takes 
a  less  indeterminate  form. 

27.  III.  Suppose  the  observations  are  such  that  the  proba- 
bility of  an  error  in  the  determination  lying  between  x  and 
x-\-hx\s>  equal  to  f{x)  8x,  with  the  necessary  relation 


I       f(^)  dx  =  1. 

J     -  00 


If  a  is  the  true  value  of  the  magnitude,  the  probability  that 
n  determinations  give  values  lying  between  a^  and  a^-^-ha-^,  ..., 
a„  and  a„  +  han ,  is 

/(«!  -  ci)f{a2 -a) .. ./{an -  a)  Baj_ ^ag . . .  Sa^. 

Denote  by  p  (a)  Ba  the  d  priori  probability  that  the  true 
value  lies  between  a  and  a  +  Ba.  Then  after  the  determinations 
have  been  made,  the  d  posteriori  probability,  that  the  true  value 
lies  between  a  and  a  +  Bol,  is  by  Bayes'  formula 

/(ai  -  a)f{a^  -a)  ...f(an-(x)p  (a)  3a , 

/(ai  -  OL)f(a,  -  a)  . .  ./(a„  -a)p  (a)  da 

J  —00 

and  the  probable  value  of  a  is 

/•CO 

I       af  (a^-  a)  ...f  (an  -  a)  p  (a)  da 

J     —  00 

I      f{a,  -  a)  . ../{an -  a)  p (a) da 

J  —CO 

If  p  (a)  were  known,  the  latter  expression  is  a  formula  for 
the  probable  value  of  a,  while  the  most  probable  value  can  be 
deduced  from  the  former  one.  Which  of  the  two  is  chosen  as 
giving  the  true  value  must  be  a  matter  of  judgment. 

The  difficulty  is  that  p  (a),  from  the  nature  of  the  case,  can 
never  be  known :  so  that  some  assumption  must  be  made  with 
regard  to  it.  It  might  be  expected  that  the  results  would 
depend  to  a  large  extent  on  the  assumptions  made  with  respect 
to  p{a);  and  this  is  no  doubt  the  case  if  such  assumptions  are 
made  quite  arbitrarily.  But  within  the  range  of  the  a's,  it 
would  be  quite  unreasonable  to  assume  rapid  variation  of  p(a). 
On  the  other  hand,  if  p  (a)  varies  slowly  between  A  and  B, 
where  A   is  less   than  the  smallest  and  B  greater  than  the 
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greatest  of  the  as,  the  values  of  the  two  above  expressions 

depend  in  general  but  slightly  on  the  way  in  which  p  (a)  varies. 

The  simplest  assumption  to  make  is,  that  p  (a)  is  constant  so 

long  as 

f(a,  -  a)f{a2  -  a) . . .  /  (a„  -  a) 

has  a  sensible  value. 

With  this  assumption,  the  probable  value  of  a  is 


/ 


o/(ai-  0L)f{a.2-a)  ...f  (an- a.)  da 


—  n 


rco  ' 

f{a,-  a)f{a,  -a)..  ./(a„  -  a)  da 

J    —  00 

and  the  most  probable  value  of  a  is  given  by  the  equation 

f'(a,-a)  ^  f'(a,-a)  ^         ^  f'{an-<i)_^ 
f(aj-a)      f(a,-a)      '"      f{an-a) 

28.  IV.  That  this  last  equation  will  not,  in  general,  give  the 
arithmetic  mean  for  the  most  probable  value  is  obvious ;  but  it 
may  be  asked,  for  what  law  of  error  is  the  arithmetic  mean  the 
most  probable  value  ? 

Write  Q^  =  F{x). 

Then  if  a  =  -  (ai  +  Oo  +  . . .  +  ««), 

n 

the  equation         F{x-^)-^F{x^  +  ...  ^  F{xn)==(^ 

is  true  when  x-^  +  x^-^- ...  4- ^,1  =  0, 

so  that 

F{x;)  +  F{X.^  +  ...  +  F{-  X,-X^-  ...-  Xn-x)  =  0 

is  an  identity.    Hence 

F'{x;)^F'{-x^-X2-  ...  -Xn-i)  =  C, 

F{x)  =  Cx+C\ 
and 

F{x\)  +  F{x^  +  ...  -\-F{xn)  =  C{x^  +  a^o  +  ...  +  ^n)  +  nC\ 

so  that  C  =  0. 

XT  /'W      n 

Hence  -yv^  =  ^^» 
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Now  the  relation 

I      f{x)dx=\ 

J     —  00 

shews  that  C  must  be  negative,  say  —  2h,  and  gives 

Hence  the  only  law  of  error  which  makes  the  arithmetic 
mean  the  most  probable  value  is  that  for  which 


IT 

This  is  called  Gauss's  law  of  error. 


/«=y! 


e-^^. 


It  may  be  noticed  that,  with  this  law,  the  probable  value 
of  a  is 


J    —00 


ae      ^  da 


J   —  cc 


n 

\2 


-h'L{ai-a) 

e      1  doL 


i: 


/     1  »*    \2 

-h\a  —  2 a,-  ) 

ae      ^     ""'    ^  da 


(     1  '^    \2 
\      e      ^     ""'    ^  da 

J    —00 


/: 


n 


=  -zai 


e-'»Va  "' 


SO  that,  with  Gauss's  law,  both  the  probable  value  and  the  most 
probable  value  are  the  arithmetic  mean. 

29.  Gauss's  law  of  error  makes  positive  and  negative  errors 
equally  probable.  It  will  therefore  certainly  not  apply  to  observa- 
tions which  are  affected  by  systematic  errors  such  as  the 
index  error  of  a  sextant.  Assuming  that  a  set  of  observations 
have  been  corrected  in  some  way  for  systematic  error,  it  may 
be  asked  whether  there  is  any  reason  to  expect  that  the  law  of 
their  errors  will  be  Gauss's  law. 
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In  general,  the  actual  error  of  an  observation  is  the  sum 
of  a  number  of  minor  errors  due  to  different  causes  and  having 
very  various  values.  To  take  a  specific  case,  suppose  that  the 
actual  error  is  the  sum  of  N  minor  errors,  each  of  which  has 
one  of  the  values  ai,  da,  ....  a^.  Denote  by  pi  the  probability, 
that  any  minor  error  has  the  value  a^,  so  that 

tpi^l. 
1 

Then  if  ini  =  N, 

1 

the  probability,  that  the  actual  error  has  the  value 

s 

1 

N\ 

which  is  the  coefficient  of  x  ^        in 

/  s  \N 

Hence  the  probability,  that  the  actual  error  lies  between 
r-i  and  rg,  is  the  sum  of  the  coefficients  of  those  powers  of  x 
in  this  expression  whose  indices  lie  between  rj  and  rg. 

In  the  simplest  case,  viz.  s  =  2,  p^  =  p.,  —  ^ ,  ai  =  —  ag,  it  has 
already  been  seen  in  Chapter  iv  that  this  leads  to  Gauss's  law ; 
so  that,  if  the  actual  error  is  made  up  of  a  sufficiently  large 
number  of  minor  errors  of  the  same  magnitude,  each  of  which 
is  equally  likely  to  be  positive  or  negative,  then  it  follows 
Gauss's  law. 

30.  This,  no  doubt,  is  a  very  special  assumption;  but  the 
result  holds  good  under  much  more  general  conditions. 

In  the  above  expression  for  the  probability,  q,  of  an  error 

Xuitti,  put 
1 

Tii^piN -\-  Xi,         (i  =  1,  2,  ...,  s). 
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Then,  when  the  factorials  are  replaced  by  their  approximate 
expressions, 


9  = 


1    /: 


L„  /; 


(p^N  +  X,)Pi^+^i  ...{psN-^  Xs^s^^^^s 


N  1 


where  i>  =  (l+^j  "'[^■^^) 


This  gives 


\ogD  =  l(piN  +  Xi)\og[l  +  ^ 


.2  /y» .3 


since  ^  aji  =  0. 

1 

Hence,  when  iV  is  large  enough, 


1  1 e^^Pi'"^ 


It  follows  that  the  probability  of  an  error  lying  between 
Vi  and  r.2  is  Sq,  where  the  sum  is  taken  for  those  values  of 
the  xs  (with  integral  differences)  for  which 


2^7^  =  0, 
1 


1 


1 

31.  So  far,  no  supposition  has  been  made  about  the  minor 
errors.  Suppose  now  that  the  probable  value  of  a  minor  error 
is  zero,  so  that 

s 

2piai  =  0. 
1 
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The  equations  of  condition  are  then 

1 

.9 

1 

Consider  now  %q  for  all  values   of  the   xs   (with  integral 
differences)  which  satisfy  the  relations 

0  ^%Xi^  I, 

When  N  is  large  enough,  this  is  sensibly 

j  "-  I  qdxidx^, ...  dxg, 

taken  over  the  range  given  by  the  above  inequalities. 

Writing  Xi  =  \^piyi ,         (i  =  1 ,  2, . . . ,  5), 

the  integral  becomes 

i=I     "'    e  dy^dy^...dys, 

taken  over  the  range 

0  ^  S  '^piyi  ^  I, 

r.^l^/piaiyi^r^. 

Now  make  an  orthogonal  substitution,  such  that 

^  S  \/piaiyi 
\/^Piai 
z^  =  l,^Piyi, 

while  z^,...,Zs  are  chosen   consistently  with    these    relations, 
which  can  certainly  always  be  done.   The  integral  then  becomes 

j^  1  . . .  I  e     ^^  dz^dzi...dzg, 

taken  over  the  range 

ri  VS Pi a-^  ^z^^r^  ^Xpi ai. 
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1         Ci\s/-Lpia-^     -o 

i.e. 


or  ,-        ^       -   re~^^^^^^'dr, 

if  Z  is  small  enough.    Finally  l^q,  subject  to  the  conditions 


r 


is  -^ ' e    2^^^*«*^(^r. 

This  completes  the  formal  proof  that  the  error  follows  Gauss's 
law,  if  it  arises  as  the  sum  of  a  sufficiently  large  number  of 
minor  errors,  of  which  the  probable  value  is  zero. 

It  should  be  noticed  that  the  quantity  Sp^a^^  occurring  in  the 
above  expression,  is  the  probable  value  of  the  square  of  a  minor 
error. 

32.  V.  If  each  component  error  follows  Gauss's  law,  it  may 
be  shewn  that  the  resultant  error  also  follows  the  law,  quite  in- 
dependently of  the  number  of  components  being  great.  Suppose 
that  the  actual  error  is  the  sum  of  two  components,  and  that 
the  probabilities  of  the  two  components  lying  between  x^  and 
^1  +  ^^1,  ^2  and  X2  +  8x2,  respectively  are 

j^^e-f^^i'Sx,  and   A/^e-^^^^^'Sx^, 

The  probability  that  the  component  errors,  assumed  to  be 
independent,  satisfy  these  conditions  simultaneously  is 

-^e-'^i^i'-f^2^2'Sx,Bx,. 

IT 

Put  ^1  +  a^a  =  X,      -x^-¥X2=  Y, 

so  that  hx^hxo=^\BXBY. 

Then  the  probability  that  the  sum  and  the  difference  of  the 
two  component  errors  lie  respectively  between  X  and  X  +  hX 
Fand  F+8F,  is 

TT 
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Hence  the  probability,  that  the  sum  of  the  component  errors  lies 
between  X  and  X  +  hX,  is 

TT  J -00 

IT  j_Qo 

h  +  K^o 


(hi  +  /?^)  TT 

This  clearly  involves  the  consequence  that,  if  a  number  of 
independent  component  errors  all  follow  Gauss's  law  and  if  the 
resultant  error  is  compounded  linearly  from  them  in  any  way, 
then  it  also  follows  Gauss's  law. 


CHAPTER  VIII 

GAUSS'S    LAW   OF    ERRORS 

33.    As  has  already  been  stated,  when  the  probabihty  of  an 
error  Ipng  between  x  and  x  4-  hx  is 


y 


IT 


the  errors  are  said  to  follow  Gauss's  law. 

The  constant  h  in  this  formula  is  clearly  a  kind  of  measure 
of  the  precision  of  the  observations,  since  the  probability  of  an 
error  of  given  magnitude  diminishes  as  h  increases.  Since 
positive  and  negative  errors  are  equally  likely,  the  probable 
error,  as  defined  in  Chapter  iv,  is  zero. 

If  {'  J'^-e-^^'~dx  =  l 


-d    V     TT 

the  probability  of  an  error  lying  within  the  interval  from  —  d 
to  d,  is  equal  to  the  probability  of  its  lying  without  this  interval. 
The  interval  from  —  c?  to  cZ  is  often  spoken  of  as  the  50  per  cent, 
zone.    Now  the  preceding  equation  may  be  written 

Vtt.'o 
In  an  appendix  (p.  103),  a  table  of  values  of 

2    ^^ 


,-a; 


,    ^    ^dx 

Vtt  -^  0 

is  given  for  values  of  x  from  0  to  3  proceeding  by  differences 
of  '1,  from  which  by  interpolation  the  value  of  the  integral  for 
intermediate  values  of  x  can  be  determined  with  considerable 
accuracy.    For  instance,  the  table  at  once  gives 

d\/h  =  '477  to  three  places  : 
which  may  be  written 

h-  '^ 
I'ldf 

or  M  =  -  -,—  , 

sJh 
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giving  the  precision-constant  in  terms  of  the  breadth  of  the 
50  per  cent,  zone,  and  vice  versa. 

If  -t  e'^'dx  =  'mS, 

the  probability  of  an  error  lying  outside  the  zone  from  -  d'  to 
d'  is  5,\o;  or  one  may  say  the  zone  from  -d'  to  d'  is  practically 
certain  to  contain  all  the  errors  if  the  number  of  observations  is 
not  too  large.    Now  the  table  gives 

d'\/h  =  2-2, 
so  that 

d'  =  4^61d, 

i.e.  the  breadth  of  tliis  zone  of  practical  certainty  is  4-61  times 
the  breadth  of  the  50  per  cent.  zone. 

Half  the  breadth  of  the  50  per  cent,  zone,  i.e.  d,  is  often  called 
the  probable  error,  though  the  phrase  is  not  used  in  the  sense 
defined  in  Chap.  iv.  There  is  no  risk  of  confusion  if  it  is 
remembered  that,  in  this  connection,  the  probable  error  is  a 
positive  quantity  d  such  that  the  magnitude  of  an  error,  apart 
from  sign,  is  equally  likely  to  be  greater  or  less  than  d. 

Mean  Error :  Error  of  Mean  Square. 

34.  There  are  two  other  multiples  of  I/VA  which  are  often 
used  in  practice. 

The  probable  value,  in  the  ordinary  sense,  of  \x\,  that  is  of 
the  error  apart  from  sign,  is 


yy: 


X I  e  ^^"dx 


c^     /h  T'        J,  "  -,  1         -504 

=  2  V  /  -       xe  '^^'ax=  -.-^z==    ~—  to  three  places. 


This  is  called  the  mean  error. 

The  probable  value  of  the  square  of  the  error,  is 

'a  C^  ,  1 

I       c(f-e~^^'dx  — 


J 


TTJ  _ 


2h' 


•707 
The  square  root  of  this,  viz.  — -^  to  three  places,  is  called  the 

wh 

error  of  mean  square. 


'TT 
h 


TT 
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The  result  of  differentiating  the  equation 

n  times  with  respect  to  h  is 

r-             ,,,        1.3...(2n-l)     / 
j    ^a^-e-^^'dx  = ^ '-^  ^ 

Thus  the  probable  value  of  the  272.th  power  of  the  error  is 

1.3...(2»-1)(1J. 

Whatever  the  value  of  h,  this  quantity  increases  continually  with 
n  when  n  is  greater  than  h]  it  takes  its  least  value  when  n  is  the 
integer  between  h  —  ^  and  h  +  ^.  The  bearing  of  the  occasional 
very  large  errors  which  Gauss's  law  implies  is  perhaps  best 
realized  in  this  way. 

35.    The  probability  that  two  observations  give  errors  lying 
respectively  between  Xi  and  x^  +  Sx^,  x^  and  x^  +  hx.^,  is 

TT 

Hence  the  probability  that  two  observations  give  errors  for 
which  I  «i  —  ^2 1  is  equal  to  or  less  than  I,  is 

-[[e-^'^^'^'^^'^Ux^dx.,, 

taken  over  the  range  for  which 

As  before,  put 

^^2  "T"  X-^  ^—  Jf  J        X^  ~~  X\  —  .A.  J 

so  that  SiX-'i  hx^  —  \  8X  S  Y. 

Then  the  required  probability  is 

taken  over  the  range        —l<  X <l, 

that  IS,  cT-        ^  dY       e    ^       dX, 

^TTJ-^  J  -I 
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In  particular 


.if/yi=- 


477  or  Z  =  -705  x  2c?,  this  probability 


is  ^.  The  difference  of  the  errors  in  two  observations  is  therefore 
equally  likely  to  be  greater  or  less  than  -705  times  the  breadth 
of  the  50  per  cent.  zone. 

This  may  be  expressed  by  saying  that,  in  two  observations, 
the  probable  "spread"  of  the  errors  is  '705  times  the  breadth  of 
the  50  per  cent.  zone.  The  question  of  determining,  in  this 
sense,  the  probable  spread  of  the  errors  in  a  set  of  n  observations 
may  be  treated  as  follows. 

The  probability,  that  the  error  of  the  ith.  of  n  observations  lies 
between  x  and  x  +  hx,  and  that  the  errors  of  the  other  ?i  —  1  lie 
between  x  and  x-^l,  is 


y^.--8.r( 


I)' 


z+l 


-hy^dy 


n-l 


Therefore  the  probability,  that  the  ith  of  the  n  obsen^ations 
has  the  algebraically  smallest  error  and  that  the  spread  does 
not  exceed  I,  is 


Ls/l^-'"'d.\{^'\l'-\-^y^dy 


Now  any  one  of  the  observations  may  have  the  algebraically 
smallest  error ;  and  the  cases  so  obtained  are  mutually  exclusive. 
It  follows  that  the  probability,  that  the  spread  of  the  errors  of  n 
observations  does  not  exceed  I,  is 


n 
h\2 


"0 


-^"^'dx 


■z+l 


,-hy'i 


dy 


n-l 


It  may  be  observed  that  this  result  can  also  be  established  by 
the  method  of  Example  x,  Chapter  ii. 

The  formula  may  be  written 


n 

"oo 

n 

— 

—  00 

7r2 

j+bjh 


o-y^ 


dy 


n-\ 


dx 


and  for  given  numerical  values  of  n  and  l^h,  its  value  may  be 
calculated  by  means  of  the  table  already  referred  to.  When  this 
is  done  for  various  values  of  l^h  and  the  same  n,  the  particular 
value  ofWh,  which  makes  the  probability  |,  may  be  approximately 
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obtained  by  interpolation.  In  this  way  the  following  table  is 
arrived  at,  giving  the  probable  spread  in  terms  of  the  breadth 
of  the  50  per  cent,  zone  : — 

n  4         5  6  8         10        12 

ll'ld        1-47     1-67     1-84     207     224     237 

It  is  easy  to  see  that  the  probable  spread  increases  without 
limit  as  n  increases. 

Combination  of  Determinations. 

36.  Assuming  that  the  errors  of  observation  of  a  certain 
quantity  do,  in  fact,  follow  Gauss's  law,  the  question  arises  as 
to  what  deductions  can  be  drawn  from  a  given  set  of  determi- 
nations. 

Denote  by  «i ,  ^'2 ,  . . . ,  ^„ , 

the  numerical  values  obtained  in  a  set  of  n  determinations:  by 
Xq  the  unknown  true  value  of  the  quantity  to  be  determined : 
and  by  h  the  unknown  value  of  the  precision-constant  for  the 
observations. 

The  probability  of  getting  a  set  of  n  determinations  lying 
respectively  between  x^  and  x^  +  dx^,  x^  and  x.2  +  dx^,  ... ,  Xn  and 
Xn  +  dxn ,  is 

n  n 


*=©" 


and  the  probability  of  getting  a  set  of  n  determinations,  which 
satisfy  definite  conditions,  is  the  integral  of  Bp  over  the  range 
determined  by  the  conditions.    In  particular,  the  integral  of  Sp 
for  all  real  values  of  the  n  variables  is  unity. 
It  is  easy  to  verify  that 

Xq  ^OCi  I    H        -i^t         (  ~  •i^i  I 

n  1     /       n  1  V^  1     / 


1{Xi~x^f  =  n 


Introduce  new  variables  defined  by 
1  *" 

3/1  =  -  :^  {xi  -  xq), 

/    n-y+1     (  1        ,  J 

^^     V  n{n- J +  2)1  -^  '     w  -J  + 1  ^  -^       -^  j 

for     j  =  2,  3, ... ,  w; 
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and  denote  by  — «/  a  numerical  constant,  the  Jacobian  of  the 
w's  with  respect  to  the  ys.  With  these  new  variables,  it  will  be 
found  that 

w  1  \ni     J       2 

n  n 

SO  that  hp  =  —  J[-\e        ^     dy^dy<i...dyn. 

The  integral  of 

w 

-  nKLy^ 

e       -     dy2dy^...dyn, 
taken  over  the  range  defined  by 

2 

is  the  product  of  e~^^^^,  and  the  integral  of 

dy^dys-.-dyn, 

taken  over  the  same  range.  Now  the  integral  of  this  last 
differential  over  the  range  defined  by 

13'^iyi'^O 

2 

is  clearly  C  ^~\  where  C  is  a  numerical  constant  depending 
only  on  n.    Hence  the  integral  of 

n 

e       •'     dy^dy^...dyn, 
taken  over  the  range  defined  by 

2 

is  Ge-''^^'l3^-'d/3, 

where  (7  is  a  numerical  constant ;  and  the  integral  of  Bp,  taken 
over  the  same  range,  is 

n 

G-jf-Y  e-^^'^y^^+^')^''-^dy^d^. 
Now  again  introduce  new  variables  defined  by 

FB  7 
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For  these,  ^  .  ^'   .   =  r  ; 

so  that  the  immediately  preceding  differential  becomes 

where  A;  is  a  numerical  constant. 

This  is  the  probability  that  yJjS  shall  lie  between  u  and 
u  +  du,  and  /3VA  shall  simultaneously  lie  between  v  and  v  +  dv. 

Now  [  ""  e-'*^'(i+^')  V''-'  dv  = ^ , 

where  k^  is  a  numerical  constant,  and 

r    e-^^'^'du  =  -J-. 

J  -00  v^  n 

Hence  the  probability,  that  yj^  lies  between  u  and  u  +  du,  is 

— - ;   and  the  probability,  that  fivh  lies  between  v  and 

(1  +  u")^ 

V  +  dv,  is  k./  -  e-*^^'i;"-'  dv. 
V    n 

37.   The  conclusions,  which  can  be  drawn  from  the  given  set 

of  determinations,  are  therefore  : 

(i)    the  probability,  that 

1  ^ 

n  1 


y 


1    «  /l   «•        \2 

n  1  \n  I    J 


lies  between  i^  and  Wg*  is 


/ 


/*         du       ' 
(ii)   the  probability,  that 

lies  between  -yj  and  Vz,  is 

TOO 

g-nv2^n-2^y 

Jo 
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du 


If 


then  cCq  is  equally  likely  to  lie  within  or  without  the  range  from 

With  the  phrase  that  has  already  been  used,  the  breadth  of 
the  50  per  cent,  zone  as  determined  by  n  observations  is 


y\  71  nn      \2 

n  1  \n  I    J 


For  the  smaller  values  of  n,  the  value  of  On  is  given  by  the 
following  table: 

2  -4416 

3  -3703 

4  -3249 

5  -2929 

6  -2687 

7  -2497 

8  -2342 

9  -2213 
10  -2104. 

When  n  is  large,  ^„  =  '6 7  26/^^1  nearly. 

38.    It  would  appear  from  the  result  just  obtained  that,  by 
sufficiently  increasing  n  the  number  of  observations,  the  probable 

difference  between  -  Ixi  and  the  magnitude  to  be  determined 

could  be  made  as  small  as  desired.  In  arrivmg  at  this  result, 
it  has  however  been  assumed  that  a?,  the  quantity  observed,  is 
susceptible  of  continuous  variation.  Now  in  fact  this  is  never 
the  case.  What  is  taken  down  as  the  result  of  an  observation  is 
always  a  rational  number.  A  series  of  observations  will  be 
represented  by  a  series  of  numbers,  say  11 '921, 11 '937, 11*918, .. . 
which  are  written  down  to  a  certain  decimal  place,  m  this  case 
the  third;  and  each  observation  is  represented  by  a  certam 
integral  multiple  of  '001.  In  taking  a  reading,  say  11-921,  what 
is  implied  is  that  the  position  of  the  pointer  or  mark  on  the 
scale  is  nearer  to  11-921  than  to   11-920  or  11-922,  the  next 
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two  possible  adjacent  readings.  In  other  words,  regarded  as  a 
measure  of  the  magnitude,  11-921  and  11*921  ±  S,  where  S  is  a 
proper  fraction  not  exceeding  ^,  are  not  to  be  distinguished 
between.  It  follows  that  the  mean  m  of  the  readings  and 
m±8,  regarded  as  measures  of  the  magnitude,  are  not  to  be 
distinguished  between ;  and  this  means  that  the  part  of  m,  which 
extends  beyond  the  third  decimal  place,  is  of  no  significance. 
The  determination  arrived  at  for  the  magnitude  must  be  an 
integral  multiple  of  '001. 

Hence,  in  taking  each  observation  as  an  integer,  it  is  implicitly 
assumed  that  the  quantity  to  be  determined  is  itself  an  integer 
and  that  every  error  is  an  integer. 

If  the  errors  are  all  integers,  the  law  of  error  which  is  most 
nearly  similar  to  Gauss's  law  is  that,  in  which  the  probability 
of  an  error  n  is  sensibly 

1        -^^ 


V-TTiY 

The  sum  of  the  series,  of  which  this  is  the  general  term,  for  all 
integral  values  of  n  exceeds  unity;  but  unless  N  is  very  small, 
the  excess  is  extremely  small.  For  instance,  if  iV  =  3  the  excess 
is  less  than  4  x  10~^^ 

Now,  assuming  this  law,  if  cc^^x^,  ...,Xn  are  the  series  of 
observed  integral  values  of  a  magnitude,  and  if  s  is  the  true 
integral  value,  the  probability  of  the  set  of  observations  is 

1  -^^{S-Xt^ 


i  «  ^1 


the  probability  is 


n(s-Si)2  +  s2 


e  ^ 


Whatever  N  may  be,  the  greatest  value  of  this  for  in^tegral 
values  of  s  is  given  by  5  =  5/,  where  Si  is  the  nearest  integer 
to  Si\  while  for  a  varying  iV,  the  greatest  value  is  given  by 

JSf  =^' +  2(s,' -  s,y. 
n 
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NOTE 

The  rule  on  p.  4  differs  from  that  usually  given,  mainly  as 
regards  the  condition  of  equal  likelihood.  This  proviso  is  by 
most  writers  put  in  the  form : — provided  that  all  the  n  results 
are  equally  likely.  It  is  evident  that,  so  far  as  the  calculation 
is  concerned,  it  is  immaterial  whether  we  say  "the  n  results  are 
equally  likely"  or  "the  n  results  are  assumed  to  be  equally  likely." 

It  is  not  however  the  same  thing  to  say  "assuming  all  the 
n  results  to  be  equally  likely"  and  "assuming  each  two  of  the 
n  results  to  be  equally  likely."  In  the  one  case,  the  property  of 
equal  likelihood  is  predicated  of  the  n  results  as  a  whole ;  and 
in  the  other,  of  each  pair  of  them. 

Suppose  the  rule  to  be  modified  so  that  the  last  clause 
runs  "provided  all  the  iV^  results  are  assumed  to  be  equally 
likely."  Consider  the  restricted  trial  subject  to  the  further 
limitation  that  condition  A  is  satisfied.  It  has  just  Nj^  possible 
results ;  and  in  Nj^  of  them,  the  condition  B  is  satisfied.  It  is 
not  however  possible  to  say  in  this  case  that  the  probability  of 
condition  B  being  satisfied  in  the  restricted  trial  is  ^abI^a-  ^^ 
order  that  this  may  be  true,  the  property  of  equal  likelihood 
must  apply  of  the  set  of  iV^  results  as  a  whole.  Does  this 
necessarily  follow  as  a  logical  consequence  from  the  assumption 
that  the  property  of  equal  likelihood  applies  to  the  N  results 
as  a  whole?  It  can  only  do  so,  if  there  is  some  criterion  for 
distinguishing  a  set  of  results  with  the  property  of  equal  likeli- 
hood from  a  set  which  does  not  possess  this  property.  In  the 
absence  of  any  such  criterion,  it  is  not  possible  to  say  that  the 
probability  of  condition  B  being  satisfied  at  the  restricted  trial 
is  Nj^jN^',  and  therefore,  from  the  modified  form  of  the  rule,  it 
is  impossible  to  deduce  (p.  6)  the  formula  (iii)  of  Chapter  i 

Pb=PUx)bPAi  +  PIAz)bPa2  +  •  •  •  -^P(As)B  pAs 

without  further  assumptions. 

It  must  in  fact  be  assumed  that,  for  each  condition  A,  all  the 
Nj^  results  which  satisfy  condition  A  are  equally  likely. 
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So  far  as  setting  up  the  necessary  formulae,  by  which  calculable 
probabilities  can  be  determined,  is  concerned,  the  last  clause  of 
the  rule  may  be  stated  in  either  of  the  forms : — 

(i)  provided  that  each  two  of  the  N  results  are  assumed  to 
be  equally  likely  ;  or 

(ii)  provided  that,  for  each  condition  A,  the  iV^  results  which 
satisfy  condition  A  are  assumed  to  be  equally  likely. 

Hitherto,  no  criterion  has  ever  been  given  for  distinguishing 
between  a  set  of  results,  which  have  the  property  of  equal  likeli- 
hood, and  a  set  which  has  not.  This  is  the  true  justification  for 
saying  that  "  each  two  of  the  results  are  assumed  to  be  equally 
likely"  rather  than  "each  two  of  the  results  are  equally  likely." 
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L. 


VttJo 
for  values  of  a?  =  *!,  *2, ... ,  2*9,  3. 


2      f' 

Table  of  the  integral  I  =  -r^\  e~"'  dx 


X 

I 

X 

I 

•1 

•1125 

1-6 

•9763 

•2 

•2270 

1-7 

•9838 

•3 

•3286 

1-8 

•9891 

•4 

•4284 

1-9 

•9928 

•5 

•5205 

2^0 

•9953 

•6 

•6039 

2^1 

•9970 

•7 

•6778 

2^2 

•9981 

•8 

•7421 

2-3 

•9989 

•9 

•7969 

2-4 

•9993 

10 

•8427 

2-5 

•9996 

11 

•8802 

2-6 

•9998 

1-2 

•9103 

2^7 

•99987 

1-3 

•9340 

2-8 

•99992 

1-4 

•9523 

2-9 

•99996 

1-5 

•9661 

3-0 

•99998 
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